University of Maryland University College DBST 652 Advanced Relational/Object-Relational Database Systems Final Examination

NAME Start Date Due Date

y y This examination is comprised of 10 equally weighted questions. Answer all the questions. Your answers on this exam should be your original work and not copied verbatim from the text or other sources. Originality, depth of analysis, and illustrations will be taken into account in grading this midterm exam. Make sure you read the entire question and include all the information that is requested to get full credit. While composing your answers, be VERY careful to cite your sources. It is easy to get careless and forget to footnote a source. Remember, failure to cite sources constitutes an academic integrity violation. Use APA style for citations and references.



The test is due no later than 11:59 p.m. Eastern Daylight Time on Sunday, April 25, 2010. Your answers should be contained in a Microsoft Word or compatible format document uploaded to your WebTycho Assignments folder. Be sure to put your name on the front of your exam.

pdf . streaming data and particularly data on the web. Today. R.unizh. It offers the flexibility to handle some of these requirements without being limited by the data types and query language available in traditional database systems. Newer applications have requirements and characteristics different from the traditional business application. the relational model does not adequately model today¶s environment in terms of unstructured data. streaming data and particularly the data on the web? Discuss pros and cons of using the relational model and related DBMSs to do so. DATA MODELING (1) Does the relational model adequately model today¶s environment in terms of unstructured data. streaming data. The RDBMS is still dominant even with its shortcomings. May 08). Fundamentals of Database Systems. W.A description in 1000 words. multimedia http://www. enhancements are being made to the Relational model in the form of Object oriented enhancements. The complex datasets that are used prominently in today¶s society include. The Geographical Information System is a great example of a technology that has benefitted from the ORDBMS concept because it takes full advantage of structured tables to handle topological attribute data at the same time allowing objects and images to visually represent these topologies (Steiniger. there are shortcomings when more complex database applications must be designed and implemented. (2009. What approaches are being proposed to deal with modeling of unstructured data for storage and management in traditional (including relational and object-oriented) database environments? Answer the question with some illustrative examples. multimedia data. geographic data. 2009). Retrieved April 10. This DBMS offers the best of both worlds by keeping the standard structured features of traditional business applications. columns and rows. Steiniger. 2007).geo. No. References Elmasri. image data and data on the web (Elmasri. In particular. new data types for storing images or large textual items and the need to define nonstandard application-specific operations (Elmasri. (2007). from www. Although the relational model has been quite successful in developing the database technology required for traditional business applications. 5th Edition. point out the shortcomings of the relational model and the difficulties of dealing with these data types in the relational model. The Object Oriented Model was proposed to meet the needs of these more complex applications. it has not really taken hold in the industry. Even with all of the technologically enhanced benefits of the OO Model. longer-duration transactions. GIS Software .unizh. The relational model was designed to handle rigidly structured data in the form of tables. where more complex structures for objects. Boston: Addison Wesley. S.<<<<<<<<<<<<<<<<<<<<<<QUESTIONS BEGIN>>>>>>>>>>>>>>>>>>>>>>>>>> Q1. 2007). unstructured data. while allowing the use of complex data. multimedia data. N. Commercial DBMS are moving towards the hybrid Object Relational Database Management System.

Lastly. (b) Show two examples of relational applications that demonstrate the advantages of using the relational data model over legacy models as the preferred format for representing and processing data. 5th Edition. Secondly. known as Legacy DBMSs. R. Burleson Corporation. N. when dealing with the challenges of newer applications. resources or skills to do it. manufacturing and logistical information. References Elmasri. data processing.Q2. from www. There are three main challenges that are faced when migrating from a legacy system to a RDBMS. This consistency is achieved by including declared constraints in the database design which helps to maintain data integrity and reduction of redundancy. were introduced in the 1960s. The relational data model has become a predominant choice for the storage of information in new databases used for financial records. (2007). data storage. Boston: Addison Wesley. Fundamentals of Database www. these systems are gradually being phased out. because of the heavy use of pointers. It permits the database designer to create a consistent. however. data structure. Legacy bumps slow trip to relational . Retrieved April 25. the learning curve to utilize a new system is a cost to the organization (Burleson Corporation. and their inability to handle objects and complex data-types. the rigidity in terms of built-in hierarchical paths in the data. satellite imagery. (a) Comment on the pros and cons of the relational data model in terms of data modeling. so even if people knew how they might implement them better m a relational database. and traffic data (Elmasri. The relational data model falls short. The first is that organizations have invested man-centuries in the development of their mission-critical databases. they haven't got the time.dba-oracle. nonconventional and unstructured data. 2010). 2010. and data transportation. However. The problems mostly stem around the inability of the RDBMS to store a variety of complex data . (2010). audio and video files. The relational data model is helped by a very robust infrastructure in terms of the commercial DBMSs that have been designed to support it. and SQL server. 2007). 2007). the lost opportunity costs of re-implementing systems is enormous. (d) Describe your ideas or solutions to address the above three challenges.dba-oracle. These unsupported data-types include text in CAD publishing. money. These DBMSs at one time modeled the business needs of organizations well. personnel data and much more. this is largely because a significant amount of business transactions are stored and managed using the relational data model and RDBMS such as Oracle. genome information. Relational databases have often replaced legacy hierarchical databases and network databases because they are easier to understand and use (Elmasri. (c) Identify and explain three unique challenges in developing a relational DBMS compared with any of the legacy DBMS. Hierarchical and network DBMSs. logical representation of information. DATA MODELLING (2) Many consider the relational data model as the lingua franca of data today in both business and government. DB2.

Q3. do the following. What Data quality challenges will be encountered? What Database design problems will be encountered? What Data architecture problems will be encountered? What Process-related challenges will be faced? How will we work around the Legacy File Management System (Ambler. programming. There is a committee assigned to evaluate whether this application should be moved over to relational environment and you are asked to prepare an ³issues to consider´ document for it. test and deploy and has been running for the last 20 years. describe each step briefly and comment on the difficulties involved. ³The relational model loses semantics of the application in mapping from a conceptual schema to relational´. It reduces flexibility because it is not easy to manipulate the source data schema. DATABASE DESIGN A. design. and internal database definitions. testing and deployment related issues. response time constraints. data . describe the series of steps that the team will go through to convert to relational environment. Start with a relational design of any application of your choice and reverse engineer to an EER schema and illustrate how the EER schema explains the application much easier compared to the relational model. there are a series of steps in the database application system life cycle. Database design: complete logical and physical design of the database system on the RDBMS us completed. 5. Things that we should consider before accepting or denying the need to migrate a Legacy DBMS to a Relational DBMS follow: 1. System definition: scope of the database. 1. List a series of issues that you will consider to come up with a go vs. users and applications are defined and interfaces for the users.) B. 2009)? Assuming that the problems listed above are workable. These constraints will be transferred to the design team because they will have to figure out how to best implement these applications in the new relational database application. and implementing the software application . Database implementation: specifying the conceptual. some tools and comment on their pluses and minuses. Assume that a sizeable application exists using a legacy database management system in an organization which took over 20 man-years of effort to design. illustrate this point. 3. no-go decision on migrating the application. 2. creating empty database files. implement. 2. Do you feel the technology today provides enough tool support to do these steps ? . or political issues. and storage and processing needs are identified. Assuming that the decision is made to migrate the existing data and application from the legacy to relational environment. external. (Be sure to consider user. C. Legacy data is often difficult to work with because of a combination of quality. architecture. The legacy database application will have its own constraints in the form of other applications that work with it. The need to work with legacy data constrains a development team.

org: (2007). Loading or data conversion: converting existing files into the database system format. 5. Boston: Addison Wesley. R. 2007). 6. Monitoring and maintenance: constant monitoring and maintenance. In the case of moving from an established system to a new one.agiledata. N. 7. 5th Edition. Operation: database system and applications are put into operation. . (2009). Fundamentals of Database Systems. steps 4 and 5 above are the most time consuming steps in the database system lifecycle and are the most underestimated steps (Elmasri.html Elmasri. S. References Ambler. Retrieved April 23. Application conversion: converting all software applications from the old system to work with the new system. Testing and validation: testing and validation of the new system. The Process of Database Refactoring: Strategies for Improving Database Quality. from www. 8. 2010.agiledata.

Q4. VIEW PROCESSING IN SQL Consider the situation where we can define materialized views over our relational database. Sum(Sales). Rewriting queries to use materialized views rather than detail relations results in a significant performance gain. Month. According to Elmasri & Navathe. CustomerSales(CustomerID. Queries are then directed to the materialized view and not to the underlying detail tables or views. (1999). Redwood City: Oracle Corporation. we can use this view relation when processing queries rather than using the base relations of our database. Month. pre-compute. it is more efficient to change the SQL to query the view because the view simplifies the specification of the query (2007). Since we store the result of the view physically. 1999). when is it correct to do so. and a materialized view that also contains aggregate function(s) and grouping. . Using the materialized view may result in a more efficient query execution plan. M. Month. Oracle 8i Tuning. It then transparently rewrites the request to use the materialized view. In your discussion. e. In your discussion include some examples considering the single relation. Sales) where the primary key is underlined and the view definition. It would be better to use the base table instead of re-writing a query to pull from this view. ProductID. In this question. Count(*) AS NumEntries From CustomerSales Where ProductID > 500 Group BY ProductID. Materialized views are generic objects that are used to summarize. However. an SQL that contains aggregate function(s) and grouping. Create View V AS Select ProductID. Materialized views improve query performance by pre-calculating expensive join and aggregation operations on the database prior to execution time and storing these results in the database. you should consider the following cases: y An SQL query that does not contain any aggregate function and a materialized view that does contain an aggregate function(s) and grouping y An SQL query that contains aggregate function(s) and grouping. In the case of a view that is formed from the joins of 2 or more tables. and distribute data. References Bauer. The view that is defined above has a single defining table and is not updatable because the view attributes do not contain the entire primary key of the base table. An SQL query that does not contain any aggregate function and a materialized view that does contain an aggregate function(s) and grouping would not benefit from a query rewrite that pulls from the materialized view. replicate. and a materialized view that also contains aggregate function(s) and grouping are more efficient and cost conscious when being rewritten to pull from the materialized view rather than the base tables (Bauer. The query optimizer can make use of materialized views by automatically recognizing when an existing materialized view can and should be used to satisfy a request.g. I want you to discuss some of the issues involved with being able to rewrite an SQL query in terms of a materialized view. view materialization involves physically creating a temporary view table when the view is first queried and keeping that table on the assumption that other queries on the view will follow.

Elmasri. Boston: Addison Wesley. (2007). Fundamentals of Database Systems. 5th Edition. N. R. .

Give some justification to support your storage organization for a relation in each of the two cases. PHYSICAL DATABASE DESIGN Consider a relational database management system that needs to provide optimized access for applications that primarily do one of the following: (a) Read access (e. 3. Secondary storage is normally considered to be nonvolatile storage. Fundamentals of Database Systems. 2007). Most database systems are stored permanently on magnetic disk secondary storage. 5th Edition. (2007). There are several primary file organizations that determine how the file records are physically placed on the disk and how the records can be accessed. N. The cost of storage per unit of data is an order of magnitude less for disk secondary storage than for primary storage. can also be used for file storage. the hash key.. 2. the Sorted File organization is the most efficient because the sorting of the records has already been done. Hashed files use a hash function applied to a particular field.g. physical storage architecture) for a relation that may be better suited for (a). Generally. and this storage retrieval method can be enhanced when using a binary search technique. Sorted files keep the records ordered by the value of a particular field called the sort key. In the case of option a) above. R.e. Predominant Write accesses. and those better suited for (b). The process of physical database design involves choosing the particular data organization techniques that best suit the given application requirements from among the options.Q5. to determine a record¶s placement on disk. where the primary access to the storage is for Read access. Boston: Addison Wesley. The circumstances that cause permanent loss of stored data arise less frequently for disk secondary storage than for primary storage. Online transaction processing. finding the next record from the current one in order of the ordering key requires no additional block accesses. for the following reasons: 1. The data stored on disk is organized as files or records. References Elmasri.. This same storage organization is not so efficient for a system with predominant Write access because inserting and deleting records are expensive operations because the file must remain in physical order. .g. would better be suited for a heap file because inserting a record is very efficient when it appends all new records to the end of the file regardless to the order of the data (Elmasri. Data warehouses. as in option b) above. Heap files place the records on disk in no particular order by appending newly inserted records at the end of the file. Tree structures. where some of the columns or aggregate values are returned) (b) Write access (e. such as the B-trees. databases are too large to fit entirely in main memory. where new records are inserted) Discuss different physical storage organizations (i.

The distributed database concept allows for the sharing of storage. methods. processing and mediating between DBMSs themselves? (b) Comment on any favorite database application area that interests you. A distributed database management system (DDBMS) is a software system that manages a distributed database while making the distribution transparent to the user. a. Problems with GIS were encountered because there was no standardization between data that was used by different organizations. various organizations have come together to standardize the building blocks of the system. They are often defined as the systematic integration of hardware and software of capturing. DATABASE TECHNOLOGIES AND APPLICATIONS Many technologies related to the database area are affecting the overall design and implementation of database applications. We use the term interoperability to refer to the management and co-operation of multiple database systems. so it made it almost impossible to share information between systems.S. I am particularly interested in the Geographic Information System applications. Since. the multi-database system architecture. Distributed databases are defined as a collection of multiple logically interrelated databases distributed over a computer network. The GIS community has been brought together by standards organizations such as the Open Geospatial Committee (OGC) and the U.entities. updating. processing and mediating between DBMSs. The Object Relational Database Management system was ushered in for use with GIS because it allowed for the handling of complex objects while allowing mapping to the attributes of the objects in the underlying relational database. . answer the following: (a) What architectural approaches are available for interoperating multiple database applications ± in terms of sharing of storage. manipulating and analyzing spatial data. Federal Geographic Data Committee (FGDC).Q6. and constraints. storing. In particular. b. a standard language called Geographic Markup Language (GML) was introduced. Also. which uses XML encoding tailored for geographic information. the relational database model could not handle the complex raster/vector image files associated with topologies. Also. Point out any challenging issues and the solution approaches being pursued by various organizations and/or groups. where there is no local autonomy and all databases are globally accessed through a site inherent to the DDBMS. displaying. Examples of distributed database system architectures are the federated database system where there is local autonomy on each database.

serialization is guaranteed. In strict 2PL. which releases the locks only after commit/abort decision. Transactions using the general version of 2PL acquire all the locks before they release any of the locks. The main variant of 2PL adopted by production systems is the Strict 2PL. In the event of an abort using strict 2PL.7 CONCURRENCY CONTROL AND RECOVERY (1) Two-Phase Locking (2PL) is the commonly used concurrency control method in production relational database management systems. Anything that was not committed only will need to be re-done. In the case of the complexity of recovery algorithms. in terms of: (a) Total amount of concurrency allowed during transaction execution. is more complex. The difference between two-phase locking (2PL) and strict 2PL is that in 2PL all locking operations (read_ lock and write_lock) precede the first unlock operation in the transaction. the amount of concurrency allowed in 2PL is greater than that of strict 2PL because records are not locked as long and other transactions are not waiting as long. the recovery algorithm. 2PL allows for more concurrency than strict 2PL. particularly when aborts happen. as compared to the general 2PL. the more time consuming the task of recovery becomes. Since concurrency is defined as a property of systems in which several processes are executing at the same time.Q. Explain the advantages and disadvantages of Strict 2PL. Since. however. in the event of an abort. particularly when aborts happen. concurrency is reduced and dead-locking of records is more prevalent. . whereas in strict 2PL a transaction does not release any of its exclusive (write) locks until after it commits or aborts. which is good. (b) Complexity of recovery algorithms. there is no reason to redo the write operation for any transactions committed before the last checkpoint. The trade off here is that data integrity is greater with strict 2PL. the greater degree of concurrency we wish to achieve.

the number of messages used by 2PC and its duration) to explain this problem for short (sub-second) atomic transactions. The booking system does not consider it acceptable for a customer to pay for a ticket without securing the seat. . CONCURRENCY CONTROL AND RECOVERY (2) Although the Two-Phase Commit (2PC) protocol guarantees atomicity property in databases. Also. A node will block while it is waiting for a message. This protocol works as a global recovery manger and is needed to maintain information needed for recovery. which cannot be easily sub-divided. If the coordinator fails permanently. as specified by the WS-Atomic Transaction protocol) with an extensible model (e. This is particularly the case when the participants use Strict 2PL (see question above). (a) One reason 2PC is considered too restrictive is the time it forces the participants to maintain locks. Each transaction is said to be atomic if when one part of the transaction fails.. (b) Another reason 2PC is considered too restrictive is the atomicity property. OR neither pay for nor reserve a seat. the 2PC has been considered too restrictive for many practical situations. the entire transaction fails and database state is left unchanged.g. Your comparison should be concrete (e. days or months). protocol steps. some cohorts will never resolve their transactions. The atomicity property guaranteed by the 2PC is a part of an extensible model such as WSCoordination protocol.. A single node will continue to wait even if all other sites have failed. causing resources to be tied up forever. Explain the problem caused by atomicity using two illustrative applications that contain long duration activities (hours. and applications they serve) and limited to two major differences. in addition to the local recovery mangers and the information they maintain. The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. as specified by the WS-Coordination protocol). and a seat reservation.g. in terms of data structures. An example of atomicity is ordering an airline ticket where two actions are required: payment. (c) Compare the atomicity property guaranteed by 2PC (e..Q8. Atomicity requires that database modifications must follow an all or nothing rule..g. nor to reserve the seat without payment succeeding. The potential passenger must either: both pay for and reserve a seat. unless there is a timeout function on the transaction the seat or reservation can be locked indefinitely. Other processes competing for resource locks held by the blocked processes will have to wait for the locks to be released. Use quantitative arguments (e.g.

RELATIONAL OPERATIONS (a) Prove that division is not strictly necessary. DBProject and their division: Completed Student Task Fred Database1 DBProject Completed ÷ DBProject Fred Database2 Task Student Fred Compiler1 Database1 Fred Eugene Database1 Database2 Eugene Compiler1 Sara Sara Database1 Sara Database2 Now.... i.. can be replaced by a combination of some relational algebra operations. in the header of R but not in the header of S.. and SUBTRACTION: We assume that a1. Division in relational algebra is used when we want to express queries with ³all´. For an example see the tables Completed. In the first step we project R on its unique attribute names and construct all combinations with tuples in S: .. the DIVISION operation is not directly implemented in most RDBMSs with SQL as the primary are the attribute names of S.e. R ÷ S.. i. The same result for a division operation can be attained using the following PROJECTS. The result consists of the restrictions of tuples in R to the attribute names unique to R.Q9.. for which it holds that all their combinations with tuples in S are present in R.. CARTESION are the attribute names unique to R and b1.

x=1(A*B) Projection: (c) Consider the universal relation R = {A.. F. dept)(E)) * dept = dnr ( (dnr. Since the closure of {A. T. So if we now take the projection on the attribute names unique to R then we have the restrictions of the tuples in R for which not all combinations with tuples in S were present in R: V := a1. B} . but weren't. So Eugene.. {F} -> {G. then 3NF relations.. D.. H}. {A. I. J}}. B} -> {C}. {B} -> {F}. for instance. Eugene -> Database1 and Eugene -> Database2 in T. . D where dept = dnr dept = dnr (E X D) Selection: select A. (b) Express the natural join using the Renaming. B}+ = R. dname)(D)) Cartesian Product: select * from E. We can then SUBTRACT R from the relation T. ename.T := a1. J} and the set of functional dependencies F = {{A.. would have two rows.x from A natural join B where X=1 A. What is the key for R? Decompose R into 2NF. and Projection operators. the key of R is {A. E}. B}. dept).an(R) × S In the prior So what remains to be done is take the projection of R on its unique attribute names and subtract those in V: W := a1. ename.. Selection. .R Note that in U we have the possible combinations that "could have" been in R. G..V a combination of the PROJECT all of the unique attributes of R and INTERSECT them with S to create a new table. dname) where dept = dnr ( (enr.. T would represent a table such that every Student (because Student is the unique key / attribute of the Completed table) is combined with every given Task. E. B. D as D(dnr.. Renaming: select * from E as E(enr. {A} -> {D. Cartesian product.. {D} -> {I. H.. In the next step we subtract R from this relation: U := T . creating a new Relation U.

select. The algebra has a minimal set of operators: restrict. R21. identify partial dependencies that violate 2NF. G. We can calculate the closures {A}+ and {B}+ to determine partially dependent attributes: {A}+ = {A. I. J}. These are attributes that are functionally dependent on either parts of the key. H}: R21 = {F. product. G. R12.Decomposition of R into 2NF First. E. F} The final set of relations in 3NF are {R11. H} Remove the attributes that are functionally dependent on parts of the key (A or B) from R and place them in separate relations R1 and R2. I. D. . difference. J}. project. C} Decomposition of R into 3NF Next. J}. R1 is decomposed into R11 and R12 as follows: R11 = {D. I. Copy the attribute D they are dependent on into R11. R12 = {A. R22 = {B. J} from R1 into a relation R11. E. I. H}. E. project. join. F. R3. {A} or {B}. we look for transitive dependencies in R1. R3} (d) Find a minimal set of relational algebra operations assuming that the set {union. F. hence {B} {F. E} The relation R2 is similarly decomposed into R21 and R22 based on the transitive dependency {B} {F} {G. The remaining attributes are kept in a relation R12. H}. alone. H}. along with the part of the key they depend on (A or B). R2. which are copied into each of these relations but also remain in the original relation. G. D. D. intersect. R3 = {A. J}. Hence {A} {D. The relation R1 has the transitive dependency {A} {D} {I. so remove the transitively dependent attributes {I. product. B. R22. J} {B}+ = {B. divide} is complete. union and difference. G. which we call R3 below: R1 = {A. Hence. restrict. R2 = {B.

cost reduction emphasis is on minimizing computation costs. Access cost to secondary storage: cost for searching for. Which cost components are used most often as the basis for cost functions? There are 5 major cost components used to estimate query execution cost. and writing data blocks that reside mainly on disk.Q10. and CARTESIAN PRODUCT algorithms discussed in Chapter 15 of your text. QUERY OPTIMIZATION (a) Discuss the cost components for a cost function that is used to estimate query execution cost. cost reduction emphasis is on minimizing communication costs. in terms of the cost functions for the individual operations. reading. UNION. <<<<<<<<<<<<<<<<<<<<<<<QUESTIONS END>>>>>>>>>>>>>>>>>>>>>>>>>>> . Communication cost: cost of shipping the query and its results from the databse sit to the site or terminal where the query originated. 1. (b) Develop cost functions for the PROJECT. Computation costs: cost of performing in memory operations 4. SET DIFFERENCE. 2. In smaller databases. and a final PROJECT. INTERSECTION. log b+[(s/bfr)]-1+(b +|R|*(x +1))+((js*|R|*|S|)/bfr ))+(b + b +((js*|R|*|S|)/bfr )+ log b)) 2 R B RS R S RS 2 (d) What are the difficulties inherent in using the cost functions described above? It is difficult to include all the cost components in a weighted cost function because of the difficulty of assigning suitable weights to the cost components. Storage cost: cost of storing intermediate files generated by a query execution strategy 3. Memory usage cost: cost pertaining to the number of memory buffers needed during query execution 5. a JOIN. Lastly in distributed databases. For large databases the main emphasis is on minimizing the access cost to secondary storage. (bE+ bD+ bElog2bE+ bDlog2bD) (c) Develop cost functions for an algorithm that consists of two SELECTs.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.