You are on page 1of 216

GATE NoteBook

Target JRF - UGC NET Computer Science Paper 2

1000 MEQs
50 Qs on ADVANCED DATABASES
BIGDATA | NoSQL | Data Mining & Data Warehousing

Most Expected Questions Course


Given a database with multiple tables, which of the following constraints can be used in a way to ensure, or will by definition not allow NULL
values to be inserted ?
I. UNIQUE
II. NOT NULL
III. FOREIGN KEY
DATABASES
IV.PRIMARY KEY
V.CHECK

A. I, II, and IV

B. I, II, IV and V

C. II,IV and V

D. I, II, III, IV and V

Ans: C
Solution : (C)
Unique allow Null values
Not Null not allow null values
Primary key not allow null values
Check not allow null values
Foreign key allow null values.
BIGDATA

201. Data is distributed over several machines and replicated to ensure their durability
to failure and high availability to parallel application in HDFS. We can use HDFS :

1. Fragmented Files
2. Streaming Data Access
3. Commodity Hardware

a) 1 and 3
b) 2 and 3
c) 1 and 2
d) All of the above
e) Only 3
BIGDATA

201. Data is distributed over several machines and replicated to ensure their durability
to failure and high availability to parallel application in HDFS. We can use HDFS :
HDFS = Hadoop Distributed File System
1. Fragmented Files = Very Large Files
2. Streaming Data Access We cannot use HDFS :
1. Low Latency data access
3. Commodity Hardware 2. Lots Of Small Files
3. Multiple Writes
a) 1 and 3
FACEBOOK has the world’s largest
b) 2 and 3 Hadoop cluter.
c) 1 and 2
d) All of the above
e) Only 3
BIGDATA

202. Which of the following are Benefits of Big Data Processing?

A. Businesses can utilize outside intelligence while taking decisions


B. Improved customer service
C. Better operational efficiency

1) A and B
2) B and C
3) A and C
4) All of them
5) None of them
BIGDATA

202. Which of the following are Benefits of Big Data Processing?

A. Businesses can utilize outside intelligence while taking decisions


B. Improved customer service
C. Better operational efficiency

1) A and B
2) B and C
3) A and C
4) All of them
5) None of them
BIGDATA

203. Which of the following is/are Big Data Technologies?

A. Apache Hadoop
B. Apache Spark
C. Apache Kafka
D. Apache Pytarch

1) A and B
2) A, B and C
3) B, C and D
4) A, C and D
5) None of them
BIGDATA

203. Which of the following is/are Big Data Technologies?

A. Apache Hadoop
B. Apache Spark
C. Apache Kafka
D. Apache Pytarch

1) A and B
2) A, B and C
3) B, C and D
4) A, C and D
5) None of them
BIGDATA
204. Apache Kafka is an open-source platform that was created by?

A. LinkedIn
B. Facebook
C. Google
D. IBM
BIGDATA
204. Apache Kafka is an open-source platform that was created by?

A. LinkedIn
B. Facebook
C. Google Apache Kafka
D. IBM Aims to provide a unified, high-throughput, low-
latency platform for handling real-time data feeds.
BIGDATA
205. MATCH THE FOLLOWING :

SET 1 (TYPES OF BIGDATA) SET 2 (FEATURES)

1. Structured i. based on Relational database table


2. UnStructured ii. based on XML/RDF
3. Semi-Structured iii. based on character and binary data

a) 1-i, 2-ii, 3-iii


b) 1-i, 2-iii, 3-ii
c) 1-iii, 2-ii, 3-i
d) 1-ii, 2-i, 3-iii
BIGDATA
205. MATCH THE FOLLOWING :

SET 1 (TYPES OF BIGDATA) SET 2 (FEATURES)

1. Structured i. based on Relational database table


2. UnStructured ii. based on XML/RDF
3. Semi-Structured iii. based on character and binary data

a) 1-i, 2-ii, 3-iii


b) 1-i, 2-iii, 3-ii
c) 1-iii, 2-ii, 3-i
d) 1-ii, 2-i, 3-iii
BIGDATA
206. Unstructured is a kind of Big Data where :

a) Transaction management required.


b) Concurrency required.
c) Bothe a) and b)
d) None of these
BIGDATA
206. Unstructured is a kind of Big Data where :

a) Transaction management required.


b) Concurrency required.
c) Bothe a) and b)
d) None of these

Transaction data of the bank is structured data.


BIGDATA
Structured data Semi-structured data Unstructured data
It is based on XML/RDF(Resource
It is based on Relational database table It is based on character and binary data
Description Framework).

Matured transaction and various Transaction is adapted from DBMS not No transaction management and no
concurrency techniques matured concurrency

Versioning over tuples,row,tables Versioning over tuples or graph is possible Versioned as a whole

It is more flexible than structured data but It is more flexible and there is absence of
It is schema dependent and less flexible
less flexible than unstructured data schema

It is very difficult to scale DB schema It’s scaling is simpler than structured data It is more scalable.

Very robust New technology, not very spread —

Queries over anonymous nodes are


Structured query allow complex joining Only textual queries are possible
possible
BIGDATA
207. What makes Big Data analysis difficult to optimize?

A. Big Data is not difficult to optimize


B. Both data and cost effective ways to mine data to make business sense out
of it
C. The technology to mine data
D. None of the above
BIGDATA
207. What makes Big Data analysis difficult to optimize?

A. Big Data is not difficult to optimize


B. Both data and cost effective ways to mine data to make business sense out
of it
C. The technology to mine data
D. None of the above
BIGDATA
208. What does commodity Hardware in Hadoop world mean?

a) Very cheap hardware

b) Industry standard hardware

c) Discarded hardware

d) Low specifications Industry grade hardware


BIGDATA
208. What does commodity Hardware in Hadoop world mean?

a) Very cheap hardware

b) Industry standard hardware

c) Discarded hardware

d) Low specifications Industry grade hardware


Commodity Hardware:
Computer hardware that is affordable and
easy to obtain.
BIGDATA
209. What does “Velocity” in Big Data mean?

a) Speed of input data generation

b) Speed of individual machine processors

c) Speed of ONLY storing data

d) Speed of storing and processing data


BIGDATA
209. What does “Velocity” in Big Data mean? Big Data was defined by the
“3Vs” but now there are “5Vs”
a) Speed of input data generation of Big Data which are Volume,
Velocity, Variety, Veracity,
b) Speed of individual machine processors Value
c) Speed of ONLY storing data

d) Speed of storing and processing data


BIGDATA
210. What is HBase used as?

a) Tool for Random and Fast Read/Write operations in Hadoop

b) Faster Read only query engine in Hadoop

c) MapReduce alternative in Hadoop

d) Fast MapReduce layer in Hadoop


BIGDATA
210. What is HBase used as?

a) Tool for Random and Fast Read/Write operations in Hadoop

b) Faster Read only query engine in Hadoop

c) MapReduce alternative in Hadoop

d) Fast MapReduce layer in Hadoop


What is HBase?
HDFS HBase
1. A distributed column-oriented database
HDFS is a distributed file system HBase is a database built on top
2. Built on top of the Hadoop file system. suitable for storing large files. of the HDFS.
3. An open-source project HDFS does not support fast HBase provides fast lookups for
individual record lookups. larger tables.
4. Horizontally scalable
It provides high latency batch It provides low latency access to
5. A data model that is similar to Google’s big table
processing; no concept of batch single rows from billions of
designed to provide quick random access to huge processing. records (Random access).
amounts of structured data. It provides only sequential HBase internally uses Hash
access of data. tables and provides random
access, and it stores the data in
indexed HDFS files for faster
lookups.

What is MapReduce?
1. A processing technique and a program model for distributed computing based on java.
2. The MapReduce algorithm contains two important tasks, namely Map and Reduce.
3. Map takes a set of data and converts it into another set of data, where individual elements are broken down into
tuples.
4. Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a
smaller set of tuples.
5. As the sequence of the name MapReduce implies, the reduce task is always performed after the map job.
6. Easy to scale data processing over multiple computing nodes.
BIGDATA
211. Which of the following are NOT TRUE for Hadoop?

a) It’s a tool for Big Data analysis

b) It supports structured and unstructured data analysis

c) It aims for vertical scaling out/in scenarios

d) Both (a) and (c)


BIGDATA
211. Which of the following are NOT TRUE for Hadoop?

a) It’s a tool for Big Data analysis

b) It supports structured and unstructured data analysis

c) It aims for vertical scaling out/in scenarios

d) Both (a) and (c)


Hadoop is an open source, Java based framework
used for storing and processing big data.
The data is stored on inexpensive commodity servers
that run as clusters. Its distributed file system enables
concurrent processing and fault tolerance.
NoSQL
212. A NoSQL originally referring to non SQL or _________ database that
provides a mechanism for storage and retrieval of data.

a) Relational
b) non relational
c) Either a) or b)
d) None of these
NoSQL
212. A NoSQL originally referring to non SQL or _________ database that
provides a mechanism for storage and retrieval of data.

a) Relational
b) non relational
c) Either a) or b)
d) None of these
NoSQL
213. NoSQL databases are used in :

1. real-time web applications


2. virtualisation
3. big data
4. enhanced data modelling

a) 1 and 2
b) 2, 3 and 4
c) 1 and 3
d) 3 and 4
NoSQL
213. NoSQL databases are used in :

1. real-time web applications


2. virtualisation
3. big data
4. enhanced data modelling

a) 1 and 2
b) 2, 3 and 4
c) 1 and 3
d) 3 and 4
NoSQL
214. Which is/are TRUE About NoSQL ?

1. Uses low-level query languages


2. Having standardized interfaces
3. huge previous investments in existing RDBMS

a) 1 and 2
b) 2 and 3
c) 1 and 3
d) All are TRUE
NoSQL
214. Which is/are TRUE About NoSQL ?

1. Uses low-level query languages


2. Having standardized interfaces
3. huge previous investments in existing RDBMS

a) 1 and 2
b) 2 and 3
Barriers to the greater adoption of NoSQL stores include :
c) 1 and 3 1. the use of low-level query languages
d) All are TRUE 2. lack of standardized interfaces
3. huge previous investments in existing relational databases.
NoSQL
215. Some database systems like MongoDB and CouchDB store data in JSON
format. Document size is ______ in NoSQL and ______ is not available.

a) decreased, MongoDB
b) Increased, GUI
c) Increased, network bandwidth
d) None of these
NoSQL
215. Some database systems like MongoDB and CouchDB store data in JSON
format. Document size is ______ in NoSQL and ______ is not available.

a) decreased, MongoDB
b) Increased, GUI
c) Increased, network bandwidth
d) None of these
NoSQL
216. NoSQL should not be used when :

a) The data changing over time and is not structured.


b) Support of Constraints and Joins is not required at database level
c) The data is growing continuously and you need to scale the database
regular to handle the data.
d) Less amount of data need to be stored and retrieved .
NoSQL
216. NoSQL should not be used when :

a) The data changing over time and is not structured.


b) Support of Constraints and Joins is not required at database level
c) The data is growing continuously and you need to scale the database
regular to handle the data.
d) Less amount of data need to be stored and retrieved .
NoSQL
217. There are many advantages of working with NoSQL databases such as
MongoDB and Cassandra. NoSQL database is :

a) High scalable but low available


b) High scalable and high available
c) Low scalable and high available
d) low scalable and low available
NoSQL
217. There are many advantages of working with NoSQL databases such as
MongoDB and Cassandra. NoSQL database is :
High scalable –
a) High scalable but low available can handle huge amount of data because of
b) High scalable and high available scalability, as the data grows NoSQL scale
itself to handle that data in efficient manner.
c) Low scalable and high available
d) low scalable and low available
High available–
Auto replication feature in NoSQL databases
makes it highly available because in case of any
failure data replicates itself to the previous
consistent state.
NoSQL
218. String is the most commonly used datatype in mongodb.
Consider the statements :
I. String is used to store data.
II. A string must be UTF 16 valid in mongodb.

Which is NOT TRUE ?

a) Only I
b) Only II
c) Both
d) None
NoSQL
219. Cassandra is written in _____ and MongoDB is written in _______.

a) C and C++
b) Java and Java
c) C++ and Java
d) Java and C++
e) C++ and C++
NoSQL
219. Cassandra is written in _____ and MongoDB is written in _______.

a) C and C++
b) Java and Java
c) C++ and Java
d) Java and C++
e) C++ and C++
NoSQL
220. Cassandra was initially created at ________ for inbox search.

a) Orkut
b) Facebook
c) Google
d) Yahoo
e) Outlook
NoSQL
220. Cassandra was initially created at ________ for inbox search.

a) Orkut
b) Facebook
c) Google
d) Yahoo
e) Outlook
NoSQL
221. Which of the following supports ACID properties i.e. Atomicity,
Consistency, Isolation, and Durability ?

a) Mongodb
b) Cassandra
c) Foursquare
d) Intuit
NoSQL
221. Which of the following supports ACID properties i.e. Atomicity,
Consistency, Isolation, and Durability ?

a) Mongodb
b) Cassandra
c) Foursquare
d) Intuit
NoSQL
Cassandra MongoDB
Developed by Apache Software foundation Developed by MongoDB Inc.

written only in Java written in C++, Go, JavaScript, Python

Writing scalability is very high and efficient. limited

Read performance is highly efficient as it takes O(1) time. not that fast

has only cursory support for secondary indexes i.e secondary indexing is
supports the concept of secondary indexes.
restricted.
only supports JSON data format. supports both JSON and BSON data formats.

The replication method supports is Selectable Replication Factor. supports is Master Slave Replication

does not provides ACID transactions but can be tuned to support ACID provides Multi-document ACID transactions with
properties. snapshot isolation.
Server OS for MongoDB are Solaris, Linux, OS X,
Server OS for Cassandra are BSD, Linux, OS X, Windows.
Windows.
Famous companies like Hulu, Instagram, Intuit, Netflix, Reddit, etc uses Famous companies like Adobe, Amadeus, Lyft,
Cassandra. ViaVarejo, Craftbase, etc uses MongoDB.
NoSQL
222. Why MongoDB is known as the best NoSQL database?

A. Easily Scalable
B. High Performance
C. Rich Query language
D. All of the above
E. None of these
NoSQL
222. Why MongoDB is known as the best NoSQL database?

A. Easily Scalable
B. High Performance
C. Rich Query language
D. All of the above
E. None of these
NoSQL
223. The O2-Tree is basically an evolution of Red-Black trees, a form of a
Binary-Search tree, in which a leaf node contains the {key value, pointer}
tuples. It satisfies the following properties:

a) Every node is either red or black.


b) The root is red. = black
c) If a node is red, then both its children are black.
d) a) and b)
e) a) and c)
T-Tree Indexing
Indexing  Designed by mixing features from AVL-Trees and B-Trees.
1. Process of associating a key with the  AVL-Trees are a type of self-balancing binary search trees, while B-
location of a corresponding data record. Trees are unbalanced.
2. common methods :  Also each node can have a different number of children.
a) B-Tree indexing  Very similar to the AVL-Tree and the B-Tree.
b) T-Tree indexing  Each node stores more than one {key-value, pointer} tuple.
c) O2-Tree indexing  Also, binary search is used in combination with the multi-tuple
nodes to produce better results in storage and performance.
B-Tree Indexing  A T-Tree has three types of nodes:
 Internal nodes can have a variable 1) A T-Node that has a right and left child,
number of child nodes in some 2) A leaf node with no children,
predefined range. 3) A half-leaf node with only one child.
 One major difference from other tree
structures is B-Tree allows nodes to  Believed that it have better overall performance than AVL-Trees.
have a variable number of child nodes,
meaning less tree balancing but more
unused space. O2-Tree Indexing
 The B+-Tree is one of the most popular  Basically an evolution of Red-Black trees, a form of a Binary-Search
variants of B-Trees. tree, in which a leaf node contains the {key value, pointer} tuples.
 It is an improvement over B-Tree that  Created to enhance the performance of current indexing methods.
requires all keys to reside in the leaves.  O2-Tree of order m (m ≥ 2), where m is the min degree of the tree.
NoSQL
224. Which of the following are the simplest NoSQL databases?

a) Key-value
b) Document
c) Wide-column
d) All of the above
NoSQL
224. Which of the following are the simplest NoSQL databases?

a) Key-value
b) Document
c) Wide-column
d) All of the above
NoSQL
Four Main Types Of Nosql 2. Key-Value Stores
Databases  The simplest type of NoSQL DB.
 Every data element in the database is stored as a key value pair
1. Document databases
consisting of an attribute name (or "key") and a value.
2. Key-value stores  It is like a relational database with only two columns:
3. Column-oriented databases a) the key or attribute name such as state
4. Graph databases b) the value such as Alaska

1. Document Databases
 Stores data in JSON, BSON , or XML documents
 Stores not in Word documents or Google docs.
 Here, documents can be nested.
 Particular elements can be indexed for faster querying.
 Documents can be stored and retrieved in a form that is much closer to the data objects used in applications
 It means less translation is required to use the data in an application.
 SQL data must often be assembled and disassembled when moving back and forth between applications and
storage.
NoSQL
Graph Databases
 Focuses on the relationship between data elements.
 Each element is stored as a node.
 The connections between elements are called links or relationships.
 Connections are first-class elements of the database, stored directly.
 It is optimized to capture and search the connections between data elements, overcoming the overhead associated
with JOINing multiple tables in SQL.
 Very few real-world business systems can survive solely on graph queries.
 As a result graph databases are usually run alongside other more traditional databases.
 Use cases include fraud detection, social networks, and knowledge graphs.

Column-Oriented Databases
 A column store is organized as a set of columns.
 When you want to run analytics on a small no of columns, you can read those columns directly without consuming
memory with the unwanted data.
 Columns are often of the same type and benefit from more efficient compression, making reads even faster.
 Columnar DB can quickly aggregate the value of a given column.
 Use cases include analytics.
NoSQL
225. Which of the following is not an example of a nosql database
management system?

a) HBase
b) MongoDB
c) CouchDB
d) PostgreSQL
NoSQL
225. Which of the following is not an example of a nosql database
management system?

a) HBase
b) MongoDB
c) CouchDB
d) PostgreSQL
COUCHDB
1. Developed by Apache Software Foundation
PostgreSQL
2. CouchDB is written in Erlang.
3. It is native JSON – document store inspired by Lotus Notes, 1. most advanced Database.
scalable from globally distributed server-clusters down to mobile
2. object based relational DBMS
phones.
4. The primary database model for CouchDB is Document Store.
3. Implementation language is C.
4. CASCADE option is supported.
5. It has Document store as Secondary database models.
6. Server operating systems for CouchDB are Android, BSD, Linux,
5. It support partial, bitmap and expression indexes.
OS X, Solaris and Windows. 6. It support Advanced data types such as arrays, hstore
7. It does not supports predefined data types. and user defined types.
8. It does not supports SQL query language.
9. It support two replication methods – Master-master replication
and Master-slave replication.
10. It does not supports In-memory capabilities.
11. It does not support to ensure data integrity after non-atomic
manipulations of data.
NoSQL
226. Which of the following is a characteristic of a NoSQL database?

a) Uses JSON
b) Needs a schema
c) Requires JOINs
d) Uses tables for storage
NoSQL
226. Which of the following is a characteristic of a NoSQL database?

a) Uses JSON
b) Needs a schema
c) Requires JOINs
d) Uses tables for storage JSON database
 A JSON document database is a type of non relational database
 Designed to store and query data as JSON documents, rather than
normalizing data across multiple tables, each with a unique and fixed
structure, as in a relational database.
 MySQL, Oracle, PostgreSQL, and SQL Server now offer JSON support.
NoSQL
227. Which of the following statement is true?

A. Non Relational databases require that schemas be defined before you can
add data
B. NoSQL databases are built to allow the insertion of data without a
predefined schema
C. NewSQL databases are built to allow the insertion of data without a
predefined schema
D. All of the above
NoSQL
227. Which of the following statement is true?

A. Non Relational databases require that schemas be defined before you can
add data
B. NoSQL databases are built to allow the insertion of data without a
predefined schema
C. NewSQL databases are built to allow the insertion of data without a
predefined schema
D. All of the above
NoSQL
228. _________ can be used for batch processing of data and aggregation
operations.

A. Hive
B. Oozie
C. MapReduce
D. None of the above
NoSQL
228. _________ can be used for batch processing of data and aggregation
operations.

A. Hive
B. Oozie
C. MapReduce
D. None of the above
NoSQL
229. Which statement(s) is/are TRUE ?

S1 - NoSQL was created to manage the scale and agility challenges that face
modern applications, but the suitability of a database depends on the
problem it must solve.
S2 - Redis, a powerful in-memory key value store used for session caching,
message queues, and other specific applications is a NoSql database.

a) S1 is True.
b) S2 is True.
c) both True.
d) Both False.
NoSQL
229. Which statement(s) is/are TRUE ?

S1 - NoSQL was created to manage the scale and agility challenges that face
modern applications, but the suitability of a database depends on the
problem it must solve.
S2 - Redis, a powerful in-memory key value store used for session caching,
message queues, and other specific applications is a NoSql database.

a) S1 is True.
b) S2 is True.
c) both True.
d) Both False.
Data Mining and Data Warehousing
230. Heterogeneous databases referred to

a) A set of databases from different vendors, possibly using different


database paradigms
b) An approach to a problem that is not guaranteed to work but performs
well in most cases.
c) Information that is hidden in a database and that cannot be recovered by a
simple SQL query.
d) None of these
e) All of these
Data Mining and Data Warehousing
230. Heterogeneous databases referred to

a) A set of databases from different vendors, possibly using different


database paradigms
b) An approach to a problem that is not guaranteed to work but performs
well in most cases.
c) Information that is hidden in a database and that cannot be recovered by a
simple SQL query.
d) None of these
e) All of these
Data Mining and Data Warehousing
231. Data can be store , retrive and updated in

a) SMTOP

b) OLTP

c) FTP

d) OLAP
Data Mining and Data Warehousing
231. Data can be store , retrive and updated in

a) SMTOP

b) OLTP

c) FTP

d) OLAP
Data Mining and Data Warehousing
232. Missing data may be due to

a) equipment malfunction

b) inconsistent with other recorded data and thus deleted data not entered
due to misunderstanding

c) certain data may not be considered important at the time of entry

d) all of the above

e) None of the above


Data Mining and Data Warehousing
232. Missing data may be due to

a) equipment malfunction

b) inconsistent with other recorded data and thus deleted data not entered
due to misunderstanding

c) certain data may not be considered important at the time of entry

d) all of the above

e) None of the above


Data Mining and Data Warehousing
233. Which is False?

A) Data cleaning is fill in missing values, smooth noisy data, identify or


remove outliers, and resolve inconsistencies

B) Data reduction Obtains reduced representation in volume but produces


the same or similar analytical results

C) both A) and B) are False

D) both A) and B) are True


Data Mining and Data Warehousing
233. Which is False?

A) Data cleaning is fill in missing values, smooth noisy data, identify or


remove outliers, and resolve inconsistencies

B) Data reduction Obtains reduced representation in volume but produces


the same or similar analytical results

C) both A) and B) are False

D) both A) and B) are True


Data Mining and Data Warehousing
234. Which of the following is data Warehousing?

A. Can be updated by end users.


B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
Data Mining and Data Warehousing
234. Which of the following is data Warehousing?

A. Can be updated by end users.


B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
Data Mining and Data Warehousing
235. Some telecommunication companies want to segment their customers
into distinct groups in order to send appropriate subscription offers.
This is an example of :
a) Supervised Learning
b) Unsupervised Learning
c) Reinforcement Learning
d) Serration
Data Mining and Data Warehousing
235. Some telecommunication companies want to segment their customers
into distinct groups in order to send appropriate subscription offers.
This is an example of :
a) Supervised Learning
b) Unsupervised Learning
c) Reinforcement Learning
d) Serration
Data Mining and Data Warehousing
236. Which of the following is NOT involve Data Mining ?

a) Knowledge extraction
b) Data archeology
c) Data exploration
d) Data transformation
Data Mining and Data Warehousing
236. Which of the following is NOT involve Data Mining ?

a) Knowledge extraction
b) Data archeology
c) Data exploration
d) Data transformation
Data Mining and Data Warehousing
237. Pick out the right approach towards data mining ?
(A). Infrastructure, exploration, analysis, exploitation, interpretation
(B). Infrastructure, exploration, analysis, interpretation, exploitation
(C). Infrastructure, analysis, exploration, interpretation, exploitation
(D). None of these
Data Mining and Data Warehousing
237. Pick out the right approach towards data mining ?
(A). Infrastructure, exploration, analysis, exploitation, interpretation
(B). Infrastructure, exploration, analysis, interpretation, exploitation
(C). Infrastructure, analysis, exploration, interpretation, exploitation
(D). None of these

What are the four data mining techniques?

Regression (predictive)
Association Rule Discovery (descriptive)
Classification (predictive)
Clustering (descriptive)
Data Mining and Data Warehousing
238. Which of the following terms is used as a synonym for data
mining?
(A). knowledge discovery in databases
(B). data warehousing
(C). regression analysis
(D). parallel processing in databases
Data Mining and Data Warehousing
Which of the following terms is used as a synonym for data
mining?
(A). knowledge discovery in databases
(B). data warehousing
(C). regression analysis
(D). parallel processing in databases

The knowledge discovery process is repetitive, interactive, and consists of steps. Note that the process is
repetitive at each step, meaning one might have to move back to the previous steps.
Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant data from collection.

Data Integration: Data integration is defined as heterogeneous data from multiple sources combined in a common source(DataWarehouse).

Data Selection: Data selection is defined as the process where data relevant to the analysis is decided and retrieved from the data collection.
Data selection using Neural network.
Data selection using Decision Trees.
Data selection using Naive bayes.
Data selection using Clustering, Regression, etc.

Data Transformation: Data Transformation is defined as the process of transforming data into appropriate form required by mining procedure. Data
Transformation is a two step process:
Data Mapping: Assigning elements from source base to destination to capture transformations.
Code generation: Creation of the actual transformation program.

Data Mining: Data mining is defined as clever techniques that are applied to extract patterns potentially useful.
Transforms task relevant data into patterns.
Decides purpose of model using classification or characterization.

Pattern Evaluation: Pattern Evaluation is defined as as identifying strictly increasing patterns representing knowledge based on given measures.
Find interestingness score of each pattern.
Uses summarization and Visualization to make data understandable by user.

Knowledge representation: Knowledge representation is defined as technique which utilizes visualization tools to represent data mining results.
Generate reports.
Generate tables.
Generate discriminant rules, classification rules, characterization rules, etc
7 STEPS IN KDD
Data Mining and Data Warehousing

239. Which is needed by K-means clustering?


(A). defined distance metric
(B). number of clusters
(C). initial guess as to cluster centroids
(D). all of these
Data Mining and Data Warehousing

239. Which is needed by K-means clustering?


(A). defined distance metric
(B). number of clusters
(C). initial guess as to cluster centroids
(D). all of these

K-means clustering is a type of unsupervised learning, which is used when you have
unlabeled data (i.e., data without defined categories or groups). The goal of this
algorithm is to find groups in the data, with the number of groups represented by the
variable K.
Data Mining and Data Warehousing

240. You are given data about seismic activity in the United States, and
you want to predict the magnitude of the upcoming earthquake. This
can be considered as an example of which of the following methods?
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Data Mining and Data Warehousing

240. You are given data about seismic activity in the United States, and
you want to predict the magnitude of the upcoming earthquake. This
can be considered as an example of which of the following methods?
A. Supervised learning
B. Unsupervised learning
C. Serration
D. Dimensionality reduction
Supervised learning
Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Basically supervised learning is
when we teach or train the machine using data that is well labeled. Which means some data is already tagged with the
correct answer. After that, the machine is provided with a new set of examples(data) so that the supervised learning
algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data.
For instance, suppose you are given a basket filled with different kinds of fruits. Now the first step is to train the machine
with all different fruits one by one like this:

If the shape of the object is rounded and has a depression at the top, is red in color, then it will be labeled as –Apple.
If the shape of the object is a long curving cylinder having Green-Yellow color, then it will be labeled as –Banana.

Now suppose after training the data, you have given a new separate fruit, say Banana from the basket, and asked to
identify it.
Since the machine has already learned the things from previous data and this time have to use it wisely. It will first
classify the fruit with its shape and color and would confirm the fruit name as BANANA and put it in the Banana category.
Thus the machine learns the things from training data(basket containing fruits) and then applies the knowledge to test
data(new fruit).
Supervised learning classified into two categories of algorithms:

Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” or “disease” and
“no disease”.
Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
Data Mining and Data Warehousing
241. A priori algorithm operates in ___ method
a. Bottom-up search method
b. Breadth-first search method
c. None of the above
d. Both a & b
Data Mining and Data Warehousing
241. A priori algorithm operates in ___ method
a. Bottom-up search method
b. Breadth-first search method
c. None of the above
d. Both a & b •Apriori algorithm is used for finding frequent itemsets
in a dataset for boolean association rule.
•Name of the algorithm is Apriori because it uses prior
knowledge of frequent itemset properties.
•We apply an iterative approach or level-wise search
where k-frequent itemsets are used to find k+1
itemsets.
Data Mining and Data Warehousing
242. Which of the following are the intermediate servers that stand
in between a relational back-end server and client front-end tools?
a. ROLAP
b. MOLAP
c. HOLAP
d. All the above
Data Mining and Data Warehousing
242. Which of the following are the intermediate servers that stand
in between a relational back-end server and client front-end tools?
a. ROLAP
b. MOLAP
c. HOLAP
d. All the above
Basis ROLAP MOLAP HOLAP
Relational Database is used Multidimensional Database is Multidimensional Database is
Storage location for summary
as storage location for used as storage location for used as storage location for
aggregation
summary aggregation. summary aggregation. summary aggregation.

Processing time of ROLAP is Processing time of MOLAP is Processing time of HOLAP is


Processing time
very slow. fast. fast.

Large storage space Medium storage space Small storage space


requirement in ROLAP as requirement in MOLAP as requirement in HOLAP as
Storage space requirement
compare to MOLAP and compare to ROLAP and compare to MOLAP and
HOLAP. HOLAP. ROLAP.

Relational database is used as Multidimensional database is Relational database is used as


Storage location for detail
storage location for detail used as storage location for storage location for detail
data
data. detail data. data.

Low latency in ROLAP as High latency in MOLAP as Medium latency in HOLAP as


Latency compare to MOLAP nad compare to ROLAP and compare to MOLAP and
HOLAP. HOLAP. ROLAP.

Slow query response time in Fast query response time in Medium query response time
Query response time ROLAP as compare to MOLAP MOLAP as compare to ROLAP in HOLAP as compare to
and HOLAP. and HOLAP. MOLAP and ROLAP.
Data Mining and Data Warehousing

243. Which is the FALSE Statement?


A. OLAP tools enable the user to access the data in Data Warehouse in an
interactive manner.
B. OLAP tools are data accessing and discovery tools.
C. OLAP systems are designed for Real-time business operations.
D. OLTP handles day to day business transactions.
Data Mining and Data Warehousing

243. Which is the FALSE Statement?


A. OLAP tools enable the user to access the data in Data Warehouse in an
interactive manner.
B. OLAP tools are data accessing and discovery tools.
C. OLAP systems are designed for Real-time business operations. (Its OLTP)
D. OLTP handles day to day business transactions.
OLAP (Online analytical processing) OLTP (Online transaction processing)
Consists of historical data from various Databases. Consists only operational current data.

It is subject oriented. Used for Data Mining, Analytics, Decision


It is application oriented. Used for business tasks.
making,etc.

The data is used in planning, problem solving and decision


The data is used to perform day to day fundamental operations.
making.

It reveals a snapshot of present business tasks. It provides a multi-dimensional view of different business tasks.

The size of the data is relatively small as the historical data is


Large amount of data is stored typically in TB, PB
archived. For ex MB, GB

Relatively slow as the amount of data involved is large. Queries


Very Fast as the queries operate on 5% of the data.
may take hours.

It only need backup from time to time as compared to OLTP. Backup and recovery process is maintained religiously

This data is generally managed by CEO, MD, GM. This data is managed by clerks, managers.

Only read and rarely write operation. Both read and write operations.
Data Mining and Data Warehousing

244. Which is the TRUE Statement?

A. Updates on the Data Warehouse is allowed.


B. Data Warehouse is defined as subject-oriented, integrated, time-variant
and Volatile.
A. Data Warehouse contains only aggregated data and aggregated transactions.
B. Data Warehouse is a storehouse of historical data.
Data Mining and Data Warehousing

244. Which is the TRUE Statement?

A. Updates on the Data Warehouse is allowed. (Not allowed)


B. Data Warehouse is defined as subject-oriented, integrated, time-
variant
and Volatile. (Its non volatile)
C. Data Warehouse contains only aggregated data and aggregated
transactions.
(its individual transactions)
D. Data Warehouse is a storehouse of historical data.
Data Mining and Data Warehousing

245.Which of the following schema supports the normalization in


dimensional modelling?
a. Star Schema
b. Snow-Flake schema
c. Fact-Constellation
d. None
Data Mining and Data Warehousing

245.Which of the following schema supports the normalization in


dimensional modelling?
1. Star schema dimension tables are not normalized,
a. Star Schema snowflake schemas dimension tables are
b. Snow-Flake schema normalized.
2. Snowflake schemas will use less space to store
c. Fact-Constellation dimension tables but are more complex.
d. None 3. Star schemas will only join the fact table with the
dimension tables, leading to simpler, faster SQL
queries.
4. Snowflake schemas have no redundant data, so
they're easier to maintain.
5. Snowflake schemas are good for data warehouses,
star schemas are better for datamarts with simple
relationships.
Data Mining and Data Warehousing
246. Which of the following statements is/are correct about
Fact constellation
A. Fact constellation schema can be seen as a combination of
many star schemas.
B. It is possible to create fact constellation schema, for each
star schema or snowflake schema.
C. Can be identified as a flexible schema for implementation.
D. All are correct
Data Mining and Data Warehousing
246. Which of the following statements is/are correct about
Fact constellation?
A. Fact constellation schema can be seen as a combination of
many star schemas.
B. It is possible to create fact constellation schema, for each
star schema or snowflake schema.
C. Can be identified as a flexible schema for implementation.
D. All are correct
Data Mining and Data Warehousing

247.A star schema has what type of relationship from a dimension to


the fact table? Select one:

A. Many-to-many
B. Many-to-one
C. One-to-one
D. One-to-many
Data Mining and Data Warehousing

247.A star schema has what type of relationship from a dimension to


the fact table? Select one:

A. Many-to-many
B. Many-to-one
C. One-to-one
D. One-to-many
Data Mining and Data Warehousing
248. In the OLAP model, the _ provides the
multidimensional view.
A. Data layer
B. Data link layer
C. Presentation layer
D. Application layer
Data Mining and Data Warehousing
248. In the OLAP model, the _ provides the
multidimensional view.
A. Data layer
B. Data link layer
C. Presentation layer
D. Application layer
Data Mining and Data Warehousing
249. The output of an OLAP query is displayed as a
A. Pivot
B. Matrix
C. Excel
D. both B and C
Data Mining and Data Warehousing
249. The output of an OLAP query is displayed as a
A. Pivot
B. Matrix
C. Excel
D. both B and C
Data Mining and Data Warehousing

250.___________ is a good alternative to the star schema.

A. Star schema.
B. Snowflake schema.
C. Fact constellation.
D. Star-snowflake schema.
Data Mining and Data Warehousing
250.___________ is a good alternative to the star schema.

A. Star schema.
B. Snowflake schema.
C. Fact constellation.
D. Star-snowflake schema.
GATE NoteBook
Target JRF - UGC NET Computer Science Paper 2

1000 MEQs
50 Qs on DATABASES
Most Expected Questions Course
DATABASES
251. An entity is
(a) a collection of items in an application
(b) a distinct real world item in an application
(c) an inanimate object in an application
(d) a data structure
DATABASES
An entity is
(a) a collection of items in an application
(b) a distinct real world item in an application
(c) an inanimate object in an application
(d) a data structure
DATABASES
252. Pick entities from the following:
(i) vendor
(ii) student
(iii) attends
(iv) km/hour
(a) i, ii, iii (b) i, ii, iv
(c) i and ii (d) iii and iv
DATABASES
Pick entities from the following:
(i) vendor
(ii) student
(iii) attends
(iv) km/hour
(a) i, ii, iii (b) i, ii, iv
(c) i and ii (d) iii and iv
DATABASES
253. Pick the relationship from the following:
(a) a classroom
(b) teacher
(c) attends
(d) cost per dozen
DATABASES
Pick the relationship from the following:
(a) a classroom
(b) teacher
(c) attends
(d) cost per dozen
DATABASES
254. Pick the meaningful relationship between entities
(a) vendor supplies goods
(b) vendor talks with customers
(c) vendor complains to vendor
(d) vendor asks prices
DATABASES
Pick the meaningful relationship between entities
(a) vendor supplies goods
(b) vendor talks with customers
(c) vendor complains to vendor
(d) vendor asks prices
DATABASES
255. Attributes are
(i) properties of relationship
(ii) attributed to entities
(iii) properties of members of an entity set
(a) i
(b) i and ii
(c) i and iii
(d) iii
DATABASES
Attributes are
(i) properties of relationship
(ii) attributed to entities
(iii) properties of members of an entity set
(a) i
(b) i and ii
(c) i and iii
(d) iii
256. The attributes of relationship teaches in teacher teaches course should be

(a) teacher code, teacher name, dept, phone no


DATABASES
(b) course no, course name, semester offered, credits
(c) teacher code, course no, semester no
(d) teacher code, course no, teacher name, dept, phone no
The attributes of relationship teaches in teacher teaches course should be

(a) teacher code, teacher name, dept, phone no DATABASES


(b) course no, course name, semester offered, credits
(c) teacher code, course no, semester no
(d) teacher code, course no, teacher name, dept, phone no
DATABASES
257. One entity may be
(a) related to only one other entity
(b) related to itself
(c) related to only two other entities
(d) related to many other entities
DATABASES
One entity may be
(a) related to only one other entity
(b) related to itself
(c) related to only two other entities
(d) related to many other entities
DATABASES
258. By relation cardinality we mean
(a) number of items in a relationship
(b) number of relationships in which an entity can appear
(c) number of items in an entity
(d) number of entity sets which may be related to a given entity
DATABASES
By relation cardinality we mean
(a) number of items in a relationship
(b) number of relationships in which an entity can appear
(c) number of items in an entity
(d) number of entity sets which may be related to a given entity
259. Normalization of database is essential to DATABASES
(i) avoid accidental deletion of required data when some data is deleted
(ii) eliminate inconsistencies when a data item is modified in the database
(iii) allows storage of data in a computer’s disk
(iv) use a database management system
(a) i and iii (b) i and ii
(c) ii and iii (d) ii and iv
Normalization of database is essential to DATABASES
(i) avoid accidental deletion of required data when some data is deleted
(ii) eliminate inconsistencies when a data item is modified in the database
(iii) allows storage of data in a computer’s disk
(iv) use a database management system
(a) i and iii (b) i and ii
(c) ii and iii (d) ii and iv
DATABASES
260.Key to represent relationship between tables is called
A. primary key
B. secondary key
C. foreign key
D. none of the above
DATABASES
Key to represent relationship between tables is called
A. primary key
B. secondary key
C. foreign key
D. none of the above
261.
DATABASES
DATABASES
DATABASES
262. A relation is said to be in 1NF if
(a) there is no duplication of data
(b) there are no composite attributes in the relation
(c) there are only a few composite attributes
(d) all attributes are of uniform type
DATABASES
A relation is said to be in 1NF if
(a) there is no duplication of data
(b) there are no composite attributes in the relation
(c) there are only a few composite attributes
(d) all attributes are of uniform type
DATABASES
263.Given an attribute x, another attribute y is
dependent on it, if for a given x
(a) there are many y values
(b) there is only one value of y
(c) there is one or more y values
(d) there is none or one y value
Given an attribute x, another attribute y is dependent
on it, if for a given x
(a) there are many y values DATABASES
(b) there is only one value of y
(c) there is one or more y values
(d) there is none or one y value
264. A relation is said to be in 2 NF if DATABASES
(i) it is in 1 NF
(ii) non-key attributes dependent on key attribute
(iii) non-key attributes are independent of one another
(iv) if it has a composite key, no non-key attribute should be dependent on
part of the composite key
(a) i, ii, iii (b) i and ii
(c) i, ii, iv (d) i, iv
A relation is said to be in 2 NF if DATABASES
(i) it is in 1 NF
(ii) non-key attributes dependent on key attribute
(iii) non-key attributes are independent of one another
(iv) if it has a composite key, no non-key attribute should be dependent on
part of the composite key
(a) i, ii, iii (b) i and ii
(c) i, ii, iv (d) i, iv
DATABASES
265. Given the following relation
vendor order (vendor no, order no, vendor name, qty supplied, price/unit) it is not
in 2 NF because
(a) it is not in 1 NF
(b) it has a composite key
(c) non-key attribute vendor name is dependent on vendor no. which is one part
of the composite key
(d) Qty supplied and price/unit are dependent
DATABASES
Given the following relation
vendor order (vendor no, order no, vendor name, qty supplied, price/unit) it is not
in 2 NF because
(a) it is not in 1 NF
(b) it has a composite key
(c) non-key attribute vendor name is dependent on vendor no. which is one part
of the composite key
(d) Qty supplied and price/unit are dependent
DATABASES
266. Given the following relation
vendor order (vendor no, order no, vendor name, qty supplied , price/unit)
the second normal form relations are
(a) vendor (vendor no, vendor name)
qty (qty supplied, price/unit)
order (order no, qty supplied)
(b) vendor (vendor no, vendor name)
order (order no, qty supplied, price/unit)
(c) vendor (vendor no, vendor name)
order (order no, qty supplied, price/unit)
vendor order (vendor no, order no)
(d) vendor (vendor no, vendor name, qty supplied, price/unit)
vendor order (order no, vendor no)
DATABASES
Given the following relation
vendor order (vendor no, order no, vendor name, qty supplied , price/unit)
the second normal form relations are
(a) vendor (vendor no, vendor name)
qty (qty supplied, price/unit)
order (order no, qty supplied)
(b) vendor (vendor no, vendor name)
order (order no, qty supplied, price/unit)
(c) vendor (vendor no, vendor name)
order (order no, qty supplied, price/unit)
vendor order (vendor no, order no)
(d) vendor (vendor no, vendor name, qty supplied, price/unit)
vendor order (order no, vendor no)
DATABASES
267. A relation is said to be in 3 NF if
(i) it is in 2 NF
(ii) non-key attributes are independent of one another
(iii) key attribute is not dependent on part of a composite key
(iv) has no multi-valued dependency
(a) i and iii (b) i and iv
(c) i and ii (d) ii and iv
DATABASES
A relation is said to be in 3 NF if
(i) it is in 2 NF
(ii) non-key attributes are independent of one another
(iii) key attribute is not dependent on part of a composite key
(iv) has no multi-valued dependency
(a) i and iii (b) i and iv
(c) i and ii (d) ii and iv
268.Which of the following is a Data Definition Language (DDL)
DATABASES
command?
(A) Delete
(B) Insert
(C) Drop
(D) Merge
Which of the following is a Data Definition Language (DDL)
DATABASES
command?
(A) Delete
(B) Insert
(C) Drop
(D) Merge
1.DDL(Data Definition Language) : DDL or Data Definition Language actually consists of the SQL commands that can be used to define the
database schema. It simply deals with descriptions of the database schema and is used to create and modify the structure of database objects
in the database. Examples of DDL commands:
1. CREATE– is used to create the database or its objects (like table, index, function, views, store procedure and triggers).
2. DROP – is used to delete objects from the database.
3. ALTER-is used to alter the structure of the database.
4. TRUNCATE–is used to remove all records from a table, including all spaces allocated for the records are removed.
5. COMMENT –is used to add comments to the data dictionary.
6. RENAME –is used to rename an object existing in the database.
DQL (Data Query Language) :
1.DQL statements are used for performing queries on the data within schema objects. The purpose of the DQL Command is to get some
schema relation based on the query passed to it. Example of DQL:
1. SELECT – is used to retrieve data from the database.

2.DML(Data Manipulation Language): The SQL commands that deals with the manipulation of data present in the database belong to DML or
Data Manipulation Language and this includes most of the SQL statements. Examples of DML:
1. INSERT – is used to insert data into a table.
2. UPDATE – is used to update existing data within a table.
3. DELETE – is used to delete records from a database table.

3.DCL(Data Control Language): DCL includes commands such as GRANT and REVOKE which mainly deal with the rights, permissions and
other controls of the database system. Examples of DCL commands:
1. GRANT-gives user’s access privileges to the database.
2. REVOKE-withdraw user’s access privileges given by using the GRANT command.

4.TCL(transaction Control Language): TCL commands deal with the transaction within the database. Examples of TCL commands:
1. COMMIT– commits a Transaction.
2. ROLLBACK– rollbacks a transaction in case of any error occurs.
3. SAVEPOINT–sets a savepoint within a transaction.
4. SET TRANSACTION–specify characteristics for the transaction.
DATABASES
269.In E-R Diagram, weak entity is represented by.......
(A) Rectangle
(B) Square
(C) Double Rectangle
(D) Circle
DATABASES
In E-R Diagram, weak entity is represented by.......
(A) Rectangle
(B) Square
(C) Double Rectangle
(D) Circle
DATABASES
270. In SQL the statement select*from R,S is equivalent to
A. Select * from R natural join S
B. Select * from R cross join S
C. Select * from R union join S
D. Select * from R inner join S
DATABASES
In SQL the statement select*from R,S is equivalent to
A. Select * from R natural join S
B. Select * from R cross join S
C. Select * from R union join S
D. Select * from R inner join S
DATABASES
271. Which of the following relational algebra operations do not require the
participating tables to be union-compatible?
(A)Union
(B) Intersection
(C) Difference
(D) Join
DATABASES
Which of the following relational algebra operations do not require the
participating
tables to be union-compatible?
(A)Union
(B) Intersection
(C) Difference
(D) Join
DATABASES
272. The operation which is not considered a basic operation of relational
algebra is
(A)Join. (B) Selection.
(C) Union. (D) Cross product.
DATABASES
The operation which is not considered a
basic operation of relational algebra is
(A)Join. (B) Selection.
(C) Union. (D) Cross product.
Ans: (A)
DATABASES
273. The default level of consistency in
SQL is
(A) repeatable read
(B) read committed
(C) read uncommitted
(D) serializable
DATABASES
The default level of consistency in SQL is
(A) repeatable read
(B) read committed
(C) read uncommitted
(D) serializable

Ans: (D)
274. Which of the following aggregate functions does not ignore nulls in its
results?.

(A) COUNT . (B) COUNT (*) DATABASES


(C) MAX (D) MIN
Which of the following aggregate functions does not ignore nulls in its
results?.

(A) COUNT . (B) COUNT (*) DATABASES


(C) MAX (D) MIN

Ans: (B)
275. Use of UNIQUE while defining an attribute of a table in SQL means that the
attribute values are
DATABASES
(A)distinct values
(B)cannot have NULL
(C)both (A) & (B)
(D)same as primary key
Use of UNIQUE while defining an attribute of a table in SQL means that the
attribute values are

(A)distinct values DATABASES


(B)cannot have NULL
(C)both (A) & (B)
(D)same as primary key

Ans: (C)
276.Cascading rollback is avoided in all protocol except
DATABASES
(A)
strict two-phase locking protocol.
(B)
tree locking protocol
(C)
two-phase locking protocol
(D)
validation based protocol
Cascading rollback is avoided in all protocol except
DATABASES
(A)
strict two-phase locking protocol.
(B)
tree locking protocol
(C)
two-phase locking protocol
(D)
validation based protocol
277. If α→β holds then so does
DATABASES
(A) γα→γβ

(B) α→→γβ

(C) both (A) and (B)

(D) None of the above


If α→β holds then so does
DATABASES
(A) γα→γβ

(B) α→→γβ

(C) both (A) and (B)

(D) None of the above

Ans: (A)
DATABASES
278. In tuple relational calculus P1 AND P2
is equivalent to
(A) (¬P1OR¬P2). (B) ¬(P1OR¬P2).
(C) ¬(¬P1OR P2). (D) ¬(¬P1OR ¬P2).
DATABASES
In tuple relational calculus P1 AND P2 is
equivalent to
(A) (¬P1OR¬P2). (B) ¬(P1OR¬P2).
(C) ¬(¬P1OR P2). (D) ¬(¬P1OR ¬P2).
DATABASES
279.For correct behaviour during recovery, undo and redo operation must be
(A)Commutative
(B) Associative
(C) idempotent
(D) distributive
DATABASES
For correct behaviour during recovery, undo and redo operation must be
(A)Commutative
(B) Associative
(C) idempotent
(D) distributive
Ans: (C)
DATABASES
280. The drawback of shadow paging
technique are
(A)Commit overhead (B) Data
fragmentation
(C) Garbage collection (D) All of these
DATABASES
The drawback of shadow paging technique
are
(A)Commit overhead (B) Data
fragmentation
(C) Garbage collection (D) All of these
Ans: (D)
The idea is to maintain two page tables during the life of a transaction: the current page table and the shadow page table.
When the transaction starts, both tables are identical. The shadow page is never changed during the life of the transaction.
The current page is updated with each write operation. Each table entry points to a page on the disk. When the transaction is
committed, the shadow page entry becomes a copy of the current page table entry and the disk block with the old data is
released. If the shadow is stored in nonvolatile memory and a system crash occurs, then the shadow page table is copied to
the current page table. This guarantees that the shadow page table will point to the database pages corresponding to the stat e
of the database prior to any transaction that was active at the time of the crash, making aborts automatic.
There are drawbacks to the shadow-page technique:

•Commit overhead. The commit of a single transaction using shadow paging requires multiple blocks to be output -- the
current page table, the actual data and the disk address of the current page table. Log-based schemes need to output only the
log records.
•Data fragmentation. Shadow paging causes database pages to change locations (therefore, no longer contiguous.
•Garbage collection. Each time that a transaction commits, the database pages containing the old version of data changed by
the transactions must become inaccessible. Such pages are considered to be garbage since they are not part of the free space
and do not contain any usable information. Periodically it is necessary to find all of the garbage pages and add them to the list
of free pages. This process is called garbage collection and imposes additional overhead and complexity on the system.
DATABASES
281. In SQL, testing whether a subquery is empty is done using
(A) DISTINCT (B) UNIQUE
(C) NULL (D) EXISTS
DATABASES
In SQL, testing whether a subquery is empty is done using
(A) DISTINCT (B) UNIQUE
(C) NULL (D) EXISTS
Ans: (D)
DATABASES
282. The FD A → B , DB → C implies
(A) DA → C (B) A → C
(C) B → A (D) DB → A
DATABASES
The FD A → B , DB → C implies
(A) DA → C (B) A → C
(C) B → A (D) DB → A
Ans: (A)
DATABASES
283. Manager salary details are hidden from the employee
.This is
(A)
Conceptual level data hiding.
(B)
External level data hiding.
(C)
Physical level data hiding.
(D)
None of these.
DATABASES
Manager salary details are hidden from the employee .This
is
(A)
Conceptual level data hiding.
(B)
External level data hiding.
(C)
Physical level data hiding.
(D)
None of these.

Ans: (A)
284. If minimum cardinality = 0 , then it signifies : DATABASES
A. Partial participation
B. Total Participation
C. Weak entity
D. Strong entity
If minimum cardinality = 0 , then it signifies : DATABASES
A. Partial participation
B. Total Participation
C. Weak entity
D. Strong entity

ANSWER : A

Minimum cardinality tells whether the participation is partial or total.


•If minimum cardinality = 0, then it signifies partial participation.
•If minimum cardinality = 1, then it signifies total participation.
Maximum cardinality tells the maximum number of entities that participates in a relationship set.
DATABASES
285. Which Statement is WRONG ?

1. A double rectangle is used for representing a weak entity set.


2. A double diamond symbol is used for representing the relationship that exists
between the strong and weak entity sets and this relationship is known as
identifying relationship.
3. A double line is used for representing the connection of the weak entity set
with the relationship set.
4. Total participation not always exists in the identifying relationship.
DATABASES
Which Statement is WRONG ?

1. A double rectangle is used for representing a weak entity set.


2. A double diamond symbol is used for representing the relationship that exists
between the strong and weak entity sets and this relationship is known as
identifying relationship.
3. A double line is used for representing the connection of the weak entity set
with the relationship set.
4. Total participation not always exists in the identifying relationship.
286.
DATABASES
236.
DATABASES
DATABASES
287. Which is TRUE ?
1. A query can be formulated in relational calculus if and only if it can be
formulated in relational calculus.
2. Relational algebra has the same power as relational calculus.
3. It is not possible to write syntactically correct relational calculus queries
that have infinite number of answers.
4. Queries that have an finite number of answers are safe relational
algebra queries.
DATABASES
Which is TRUE ?
1. A query can be formulated in relational calculus if and only if it can be
formulated in relational calculus.
2. Relational algebra has the same power as relational calculus.
3. It is not possible to write syntactically correct relational calculus queries
that have infinite number of answers.
4. Queries that have an finite number of answers are safe relational
algebra queries.

1. A query can be formulated in relational calculus if and only if it can be


formulated in relational algebra.
2. Relational algebra has the same power as relational calculus.
3. It is possible to write syntactically correct relational calculus queries that
have infinite number of answers. Such queries are unsafe.
4. Queries that have an finite number of answers are safe relational
calculus queries.
288. Which is the correct option ? DATABASES
1. SELECT DISTINCT In SQL = Projection (Π) in Relational Algebra
2. FROM in SQL = Cartesian Product (×) in Relational Algebra
3. WHERE in SQL = SELECTION(σ) in Relational Algebra
4. All are Correct
Which is the correct option ? DATABASES
1. SELECT DISTINCT In SQL = Projection (Π) in Relational Algebra
2. FROM in SQL = Cartesian Product (×) in Relational Algebra
3. WHERE in SQL = SELECTION(σ) in Relational Algebra
4. All are Correct
DATABASES
289. Which will generate Error ?
SELECT first_name, last_name, COUNT(*) FROM student GROUP BY first_name;

SELECT first_name, last_name, COUNT(*) FROM student GROUP BY first_name, last_name;

SELECT first_name, last_name, COUNT(*) FROM student GROUP BY last_name, first_name ;

ALL Correct
DATABASES
SELECT first_name, last_name, COUNT(*) FROM student GROUP BY first_name;

SELECT first_name, last_name, COUNT(*) FROM student GROUP BY first_name, last_name;

SELECT first_name, last_name, COUNT(*) FROM student GROUP BY last_name, first_name ;

ERROR : not a GROUP BY expression

This error happens because you’re using an aggregate


function, and there is at least one column in the SELECT not a group by expression error
clause that is not in the GROUP BY clause. happens because the columns
in the SELECT clause don’t
match the columns in the
Because I use an aggregate function (COUNT), I need to GROUP BY clause. To resolve
define all of the columns in a GROUP BY clause that are the error, make sure the
in the SELECT clause. columns match.
290.Consider the following relational schema: DATABASES
Suppliers (sid:integer, sname:string, sadress:string)
Parts (pid:integer, pname:string, pcolor:string)
Catalog (sid:integer, pid:integer, pcost:real)

What is the result of the following query?

(SELECT Catalog.pid from Suppliers, Catalog


WHERE Suppliers.sid = Catalog.pid)
MINUS
(SELECT Catalog.pid from Suppliers, Catalog
WHERE Suppliers.sname <> 'sachin' and Suppliers.sid = Catalog.sid)

1. Pid of parts supplied by all except Sachin


2. Pid of parts supplied only by Sachin
3. Pid of parts available in catalog supplied by Sachin
4. Pid of parts available in catalog supplied by all except Sachin
Consider the following relational schema: DATABASES
Suppliers (sid:integer, sname:string, sadress:string)
Parts (pid:integer, pname:string, pcolor:string)
Catalog (sid:integer, pid:integer, pcost:real)

What is the result of the following query?

(SELECT Catalog.pid from Suppliers, Catalog


WHERE Suppliers.sid = Catalog.pid)
MINUS
(SELECT Catalog.pid from Suppliers, Catalog
WHERE Suppliers.sname <> 'sachin' and Suppliers.sid = Catalog.sid)

1. Pid of parts supplied by all except Sachin


2. Pid of parts supplied only by Sachin
3. Pid of parts available in catalog supplied by Sachin
4. Pid of parts available in catalog supplied by all except Sachin
291. Consider the following relation
Cinema (theater, address, capacity) Which of the following options will be needed at the end of the
SQL query
SELECT P1. address FROM Cinema P1 Such that it always finds the addresses of theaters with
maximum capacity?
(A) WHERE P1. Capacity> = All (select P2. Capacity from Cinema P2)
(B) WHERE P1. Capacity> = Any (select P2. Capacity from Cinema P2)
DATABASES
(C) WHERE P1. Capacity > All (select max(P2. Capacity) from Cinema P2)
(D) WHERE P1. Capacity > Any (select max (P2. Capacity) from Cinema P2)
Consider the following relation
Cinema (theater, address, capacity) Which of the following options will be needed at the end of the
SQL query
SELECT P1. address FROM Cinema P1 Such that it always finds the addresses of theaters with
maximum capacity?
(A) WHERE P1. Capacity> = All (select P2. Capacity from Cinema P2) DATABASES
(B) WHERE P1. Capacity> = Any (select P2. Capacity from Cinema P2)
(C) WHERE P1. Capacity > All (select max(P2. Capacity) from Cinema P2)
(D) WHERE P1. Capacity > Any (select max (P2. Capacity) from Cinema P2)
A is the answer

B - Returns the addresses of all theaters.


C - Returns null set. max() returns a single value and
there won't be any value > max.
D - Returns null set. Same reason as C. All and ANY
works the same here as max returns a single value.
292. Consider a relation- R ( A , B , C , D , E ) with functional dependencies-
A → BC
DATABASES
CD → E
B→D
E→A

Find the number of possible keys and normal form ?

A. 2 , 2NF
B. 3, 3NF
C. 4 , 3NF
D. 4, 2NF
Consider a relation- R ( A , B , C , D , E ) with functional dependencies-
A → BC
DATABASES
CD → E
B→D
E→A

The possible keys for this relation are-


A , E , CD , BC

From here,
•Prime attributes = { A , B , C , D , E }
•There are no non-prime attributes

Now,
•It is clear that there are no non-prime attributes in the relation.
•In other words, all the attributes of relation are prime attributes.
•Thus, all the attributes on RHS of each functional dependency are prime
attributes.

Thus, we conclude that the given relation is in 3NF.


293. Let R = (A, B, C, D, E) be a relation scheme with
the following dependencies- DATABASES
AB → C
C→D
B→E
Determine the total number of candidate keys and
super keys.

A. 1,3
B. 1,2
C. 1,8
D. 1,9
Let R = (A, B, C, D, E) be a relation scheme with the
following dependencies- DATABASES
AB → C
C→D
B→E
Determine the total number of candidate keys and
super keys.

Only one candidate key AB is possible.

There are total 5 attributes in the given relation of which-


• There are 2 essential attributes- A and B.
• Remaining 3 attributes are non-essential attributes.
• Essential attributes will be definitely present in every key.
• Non-essential attributes may or may not be taken in every super key.

AB___

So, number of super keys possible = 2 x 2 x 2 = 8.


Thus, total number of super keys possible = 8.
294. A relation R (A , C , D , E , H) is having two functional dependencies sets F
and G as shown-
DATABASES
Set F-
A→ C
AC → D
E → AD
E→H

Set G-
A → CD
E → AH

Which of the following holds true?


(A) G ⊇ F
(B) F ⊇ G
(C) F = G
(D) All of the above
A relation R (A , C , D , E , H) is having two functional dependencies sets F and G as shown-

Set F-
A→ C Determining whether F covers G- DATABASES
AC → D
E → AD Step-1:
E→H
•(A)+ = { A , C , D } // closure of left side of A → CD using set
Set G- G
A → CD •(E)+ = { A , C , D , E , H } // closure of left side of E → AH using set
E → AH G

Which of the following holds true? Step-2:


(A) G ⊇ F
(B) F ⊇ G •(A)+ = { A , C , D } // closure of left side of A → CD using set
(C) F = G F
(D) All of the above •(E)+ = { A , C , D , E , H } // closure of left side of E → AH using set
F
Comparing the results of Step-1 and Step-2, we find-
•Functional dependencies of set F can determine all the attributes which have been determined by
the functional dependencies of set G.
•Thus, we conclude F covers G i.e. F ⊇ G.
A relation R (A , C , D , E , H) is having two functional dependencies sets F
and G as shown-
DATABASES
Set F-
A→ C Determining whether G covers F-
AC → D
E → AD Step-1:
E→H
(A)+ = { A , C , D } // closure of left side of A → C using set F
Set G- (AC)+ = { A , C , D } // closure of left side of AC → D using set F
A → CD (E)+ = { A , C , D , E , H } // closure of left side of E → AD and E → H using set F
E → AH
Step-2:
Which of the following holds true? (A)+ = { A , C , D } // closure of left side of A → C using set G
(A) G ⊇ F (AC)+ = { A , C , D } // closure of left side of AC → D using set G
(B) F ⊇ G (E) = { A , C , D , E , H } // closure of left side of E → AD and E → H using set G
+

(C) F = G
(D) All of the above Step-3:

Comparing the results of Step-1 and Step-2, we find-


Functional dependencies of set G can determine all the attributes which have been
determined by the functional dependencies of set F.
Thus, we conclude G covers F i.e. G ⊇ F.
295. For a database relation R(a,b,c,d) where the domains of DATABASES
a, b, c and d include only atomic values, only the following
functional dependencies and those that can be inferred from
them hold
a -> c
b -> d
The relation is in
A. 1NF
B. 2NF
C. 3NF
D. BCNF
For a database relation R(a,b,c,d) where the domains of a, b, c DATABASES
and d include only atomic values, only the following
functional dependencies and those that can be inferred from
them hold
a -> c
b -> d
The relation is in
A. 1NF
B. 2NF
C. 3NF
D. BCNF

1NF but not 2NF


Candidate Key is ab.
Since all a,b,c,d are atomic so the relation is in 1 NF.
a→c (Prime derives Non-Prime.)
b→d (Prime derives Non-Prime.)
Since, there are partial dependencies it is not in 2NF.
DATABASES

296.Which is TRUE ?
1. Canonical cover is free from all the extraneous functional
dependencies.
2. The closure of canonical cover is subset as that of the given
set of functional dependencies.
3. Canonical cover is unique.
4. All
Which is TRUE ?
1. Canonical cover is free from all the extraneous functional
dependencies.
2. The closure of canonical cover is subset as that of the given set of
functional dependencies.
3. Canonical cover is unique. DATABASES
4. All

•Canonical cover is free from all the extraneous functional dependencies.


•The closure of canonical cover is same as that of the given set of
functional dependencies.
•Canonical cover is not unique and may be more than one for a given set of
functional dependencies.
DATABASES
297. Which is false ?

1. If A → BC, then A → B and A → C always holds.


2. If A → B and C → D, then AC → BD always holds.
3. If A → B and A → C, then A → BC always holds.
4. NONE
DATABASES
Which is false ?

1. If A → BC, then A → B and A → C always holds.


2. If A → B and C → D, then AC → BD always holds.
3. If A → B and A → C, then A → BC always holds.
4. NONE
Decomposition-
If A → BC, then A → B and A → C always holds.

Composition-
If A → B and C → D, then AC → BD always holds.

Additive-
If A → B and A → C, then A → BC always holds.
298. A B/B+ tree with order 5. Find minimum children ?
DATABASES
a) 1
b) 2
c) 3
d) none
A B/B+ tree with order 5. Find minimum children ?
DATABASES

a) 1
b) 2
c) 3 •A B/B+ tree with order p has maximum p pointers and hence
d) none maximum p children.

•A B/B+ tree with order p has minimum ceil(p/2) pointers and

hence minimum ceil(p/2) children.

•A B/B+ tree with order p has maximum (p – 1) and minimum

ceil(p/2) – 1 keys.
DATABASES
299. The maximum number of super keys for the relation
schema R(A,B,C,D) with AB as the key is
(A) 5
(B) 6
(C) 7
(D) 4
DATABASES
The maximum number of super keys for the relation schema
R(A,B,C,D) with AB as the key is
(A) 5
(B) 6
(C) 7
(D) 4

Ans: (D)
Explanation:
 Maximum no. of possible super keys for a table with n
attributes = 2(n-2)
 Here, n = 4.
 So, the possible super keys = 24-2 = 4
 The possible super keys are: AB, ABC, ABD, ABCD
300. Given a database with multiple tables, which of the following constraints can be used in a way to ensure, or will by definition not allow
NULL values to be inserted ?
I. UNIQUE
II. NOT NULL DATABASES
III. FOREIGN KEY
IV.PRIMARY KEY
V.CHECK

A. I, II, and IV

B. I, II, IV and V

C. II,IV and V

D. I, II, III, IV and V


Given a database with multiple tables, which of the following constraints can be used in a way to ensure, or will by definition not allow NULL
values to be inserted ?
I. UNIQUE
II. NOT NULL
III. FOREIGN KEY
DATABASES
IV.PRIMARY KEY
V.CHECK

A. I, II, and IV

B. I, II, IV and V

C. II,IV and V

D. I, II, III, IV and V

Ans: C
Solution : (C)
Unique allow Null values
Not Null not allow null values
Primary key not allow null values
Check not allow null values
Foreign key allow null values.
Copy protected with Online-PDF-No-Copy.com

You might also like