You are on page 1of 4

MC0077- Advanced Database System

Q.1 List and explain various Normal Forms. How BCNF differs from the Third Normal Form and 4th Normal forms?
Ans 1 Thie various forms are first form, second form, third form, boyce-codd normal form and fourth form. Relations are classified upon the types of anomalies to which they are vulnerable. A database thats in the normal form is vulnerable to all types of anomalies, while a database thats in the domain/key normal form has no modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level in the first normal form, and the database cannot meet the requirements for higher level normal forms without first having met all the requirements of the lesser normal. First Normal Form Any table having any relation is said to be in the first normal form. The criterion that must be met to be considered relational is that the cells of the table must contain only single values. All attributes must be of the same kind, and each column must have a unique name. Each row in the table must be unique. Databases in the first normal form are the weakest and suffer from all modification anomalies. Second Normal Form If all a relational databases non-key attributes are dependent on the entire key, then the database is considered to meet the criteria for being in the second normal form. This normal form solves the problem of partial dependencies, but this normal form only pertains to relations with composite keys. Third Normal Form A database is in the third normal form if it meets the criteria for a second normal form and has no transitive dependencies Boyce-Codd Normal Form A database that meets third normal form criteria and every determinant in the database is a candidate key, its said to be in the Boyce-Codd Normal Form. This normal form solves the issue of functional dependencies. Fourth Normal Form It is an extension of BCNF for functional and multi-valued dependencies. A schema is in 4NF if the left hand side of every non-trivial functional or multi-trivial functional or multi-valued dependency is a super-key.

Q2 What are differences in Centralized and Distributed Database Systems? List the relative advantages of data distribution
Ans 2 Differences in Centralized and distributed database systems:-

1. Organizational and Economic Reasons:- The organizational and economic motivations are amongst the main reason for the development of distributed databases. In organizations already having several databases and feeling the necessity of global applications, distributed databases is the natural choice. 1

2. Incremental Growth:- In distributed environment, expansion of the system in terms of adding more data, increasing database size, or adding more processors is much easier. 3. Reduced Communication Overhead:- Many applications are local, and these applications do not have any communication overhead. Therefore, the maximization of the locality of applications is one of the primary objectives in distributed database design. 4. Performance Considerations:- Data localization reduces the contention for CPU and I/O services and simultaneously reduces access delays involved in wide ae networks. Local queries and transactions accessing data at a single site have better performance because of the smaller local databases. Moreover, inter-query and intra-query parallelism can be achieved by executing multiple queries at different sites, or breaking up a query into a number of sub queries that execute in parallel. This contributes to improved performance 5. Reliability and Availability:- Reliability is defined as the probability that a system is running at a certain time point. Availability is the probability that the system is continuously available during a time interval. This improves both reliability and availability. Further improvement is achieved by judiciously replicating data and software at more than one site.

Q3 Describe the concepts of Structural Semantic Data Model (SSM).


Ans 3 The concepts of structural semantic data model (SSM)

1. Three types of entity specifications: base (root), subclass, and weak. 2. Four types of inter-entity relationships: n-ary associative and 3 types of classification hierarchies. 3. Four attribute types: atomic, multi-valued, composite, and derieved 4. Domain type specifications in the graphic model, including; standard data types, Binary large objects, user-defined types (UDT) and functions (UDF) 5. Cardinality specifications for entity to relationship-type connections and for multi-valued attribute types and 6. Data value constraints.

Q4 Describe the following with respect to Object Oriented Databases: A. Query Processing in Object-Oriented Database Systems B. Query Processing Architecture
Ans 4 (A) Query Processing in Object-Oriented Database Systems:-

One of the criticisms of the first generation object-oriented database management system was their lack of declarative query capabilities. This led some researchers to brand first generation DBMSs as object-oriented. It was commonly believed that the application domain that OODBMS technology targets do not need querying capabilities. This belief no longer holds, and declarative query capability is acceptable as of the fundamental features of OO-DBMS. Indeed, most of the current prototype systems experiment with powerful query languages and investigate their optimization. Commercial products have started to include such languages as well e.g. O2 and object Store.

Ans 4 (B) 1. 2. 3. 4. 5. 6. 7. 8. 9.

Query Processing Architecture

Queries are expressed in a declarative language It requires no user knowledge of object implementations, access paths or processing strategies The calculus expression is first Calculus Optimization Calculus Algebra Transformation Type check Algebra Optimization Execution Plan Generation Execution

Q5 Describe the Differences between Distributed & Centralized Databases.


Ans 5 1.Organizational and Economic Reasons:- The organizational and economic motivations are amongst the main reason for the development of distributed databases. In organizations already having several databases and feeling the necessity of global applications, distributed databases is the natural choice. 2 Incremental Growth:- In distributed environment, expansion of the system in terms of adding more data, increasing database size, or adding more processors is much easier. Reduced Communication Overhead:- Many applications are local, and these applications do not have any communication overhead. Therefore, the maximization of the locality of applications is one of the primary objectives in distributed database design. Performance Considerations:- Data localization reduces the contention for CPU and I/O services and simultaneously reduces access delays involved in wide ae networks. Local queries and transactions accessing data at a single site have better performance because of the smaller local databases. Moreover, inter-query and intra-query parallelism can be achieved by executing multiple queries at different sites, or breaking up a query into a number of sub queries that execute in parallel. This contributes to improved performance Reliability and Availability:- Reliability is defined as the probability that a system is running at a certain time point. Availability is the probability that the system is continuously available during a time interval. This improves both reliability and availability. Further improvement is achieved by judiciously replicating data and software at more than one site.

Q6 Describe the following: (A) Data Mining Functions (B) Data Mining Techniques
Ans 6 (A) Data Mining Functions:- Data mining methods may be classified by the function they perform or according to the class of application they can be used in, and in the case of supervised learning this requires the user to define one or more classes. 1. Classification:- the database contains one or more attributes that denote the class of a tuple and these are known as predicted attributes whereas the remaining attributes are called predicting attributes. The categories of rules are: a) Exact rule:- Permits no exceptions so each object of LHS must be an element of RHS. b) Strong rule:- allows some exceptions, but the exceptions have a given limit. c) Probabilobistic rule:- relates the conditional probability. 2. Associations:- Given a collection of items and a set of records, each of which contain some number of items from the given collection, an association function is an operation against this set of records which return affinities or patterns that exist among the collection of items. 3. Sequential/Temporary patterns:-This function analyses a collection of records over a period of time for example to identify trends. Where the identity of a customer who made a purchase is known an analysis can be made of the collection of related records of the same structure. 4. Clustering/Segmentation:- It is the process of creating a partition so that all the members of each set of the partition are similar according to some metric. 5. IBM-Marketing Basket Analysis example:- Used some segmentation techniques in their Market Basket Analysis on POS transactions where they separate a set of untagged input records into reasonable groups according to product revenue by market basket Ans 6 (B) Data Mining Techniques a) Cluster Analysis:- In an unsupervised learning environment the system has to discover its own classes and one way in which it does this is to cluster the data in the database. b) Induction:- This is a higher level of information or knowledge in that it is a general statement about objects in the database. The database is searched for patterns o regularities.