SSEC\MCA\DBMS Question Bank\2010-2013 Batch
Two Marks Unit III 1. What do you mean by an index? An index can be viewed as a collection of data entries, with an efficient way to locate all data entries with search key value k. Each such data entry, k∗, contains enough information to enable us to retrieve (one or more) data records with search key value k. 2. What are the different ways in which an entry can be made in an index? A data entry k* allows us to retrieve one or more data records with key value k. The three main alternatives: • A data entry k* is an actual data record (with search key value k). • A data entry is a <k, rid> pair, where rid is the record id of a data record with search key value k. • A data entry is a <k, rid-list> pair, where rid-list is a list of record ids of data records with search key value k. 3. Differentiate between clustered and unclustered index. When a file is organized so that the ordering of data records is the same as or close to the ordering of data entries in some index, the index is clustered. An index that is not clustered is called an unclustered index;
4. Differentiate between dense and sparse index.
An index is said to be dense if it contains (at least) one data entry for every search key value that appears in a record in the indexed file.3 A sparse index contains one entry for each page of records in the data file. 5. What do you mean by fully inverted and inverted file? A data file is said to be inverted on a field if there is a dense secondary index on this field. A fully inverted file is one in which there is a dense secondary index on each field that does not appear in the primary key. 6. Define primary and secondary indices. An index on a set of fields that includes the primary key is called a primary index. An index that is not a primary index is called a secondary index. A primary index is guaranteed not to contain duplicates, but an index on other (collections of) fields can contain duplicates. Thus, in general, a secondary index contains duplicates. 7. What do you mean by a composite search key or concatenated keys? The search key for an index can contain several fields; such keys are called composite search keys or concatenated keys. As an example, consider a collection of employee records, with fields name, age, and sal, stored in sorted order by name. Example of composite key would be a composite index with key <age, sal>, a composite index with key <sal, age>, an index with key age, and an index with key sal.
V. R. Kanagavalli Page 1
SSEC\MCA\DBMS Question Bank\2010-2013 Batch
8. Differentiated between a range query and an equality query?
If the search key is composite, an equality query is one in which each field in the search key is bound to a constant. For example, retrieving all data entries with age = 20 and sal = 10. The hashed file organization supports only equality queries, since a hash function identifies the bucket containing desired records only if a value is specified for each field in the search key. A range query is one in which not all fields in the search key are bound to constants. An example of a range query retrieving all data entries with age < 30 and sal > 40. 9. What is the advantage of tree structured indexes? Tree-structured indexes are ideal for range selections, and also support equality selections quite efficiently.
10. Define ISAM. What is the disadvantage of ISAM?
ISAM is a static tree-structured index in which only leaf pages are modified by inserts and deletes. If a leaf page is full, an overflow page is added. Unless the size of the dataset and the data distribution remain approximately the same, overflow chains could become long and degrade performance.
11. Define a B+ Tree and its order
A B+ tree is a dynamic, height-balanced index structure that adapts gracefully to changing data characteristics. Each node except the root has between d and 2d entries. The number d is called the order of the tree. Each non-leaf node withm index entries has m+1 children pointers. The leaf nodes contain data entries. Leaf pages are chained in a doubly linked list. 12. How the B+ Tree does handles insertion and deletion of data in it? During insertion, nodes that are full are split to avoid overflow pages. Thus, an insertion might increase the height of the tree. During deletion, a node might go below the minimum occupancy threshold. In this case, the entries can be either redistributed from adjacent siblings, or the node can be merged with a sibling node. A deletion might decrease the height of the tree. 13. What is the purpose of key compression in B+ Tree? Key compression is the technique used in B+ Trees search key values in index nodes are shortened to ensure a high fan-out. 14. List out some of the characteristics of a B+ Tree. • • • Operations (insert, delete) on the tree keep it balanced. A minimum occupancy of 50 percent is guaranteed for each node. Searching for a record requires just a traversal from the root to the appropriate leaf.
V. R. Kanagavalli
SSEC\MCA\DBMS Question Bank\2010-2013 Batch
15. What do you mean by the height of the B+ Tree? The length of a path from the root to any leaf (because the B+ tree is balanced) is referred to as the height of the tree.
16. Define a B+ Tree. The B+ tree search structure is a balanced tree in which the internal nodes direct the search and the leaf nodes contain the data entries. In order to retrieve all leaf pages efficiently, they are linked using page pointers. The leaf pages can be traversed in either direction as they are organized as a doubly linked list. 17. Give the format of an index page.
Where the Pi are pointers to the data entries that is having values in the range of and
Ki and Ki+1
P0 points to the data entries that has key value less than K1 and Pm points to the data entries that has key values greater than Km.
18. Give the format of a one level index structure.
19. Give the format of an ISAM Index structure.
V. R. Kanagavalli
SSEC\MCA\DBMS Question Bank\2010-2013 Batch
20. R. Explain the page allocation in ISAM
in the absence of overflow chains. The disadvantage of Linear Hashing relative to Extendible Hashing is that space utilization could be lower. an overflow page is allocated and linked to the primary bucket. Linear Hashing proceeds in rounds.
Linear Hashing avoids a directory by splitting the buckets in a round-robin fashion. The value l is called the local depth of the page. During insertion. but buckets are split sequentially in order. however. 27. Extendible and Linear Hashing are closely related. leading to poor performance.
23. /Define Extendible hashing Extendible Hashing is a dynamic index structure that extends Static Hashing by introducing a level of indirection in the form of a directory. then the data is said to be skewed. A hashing function is applied to a search field value and returns a bucket number. The correct directory entry is found by looking at the first d bits of the result of the hashing function. Static Hashing suffers from long overflow chains and performance deteriorates. data entries from the full page are redistributed according to the first l bits of the hashed values. Insertions can trigger bucket splits. The list of overflow pages at a bucket is called its overflow chain. Define dynamic hashing. 25. which is called the global depth of the index.
A Static Hashing index has a fixed number of primary buckets.
26. Hash-based indexes are designed for equality queries. 24. Kanagavalli Page 5
. As the file grows. If a page is full and a new data entry falls into that page. Linear Hashing avoids a directory structure by having a predefined order of buckets to split. but overflow chains are unlikely to be long because each bucket will be split at some point. The directory entry points to the page on disk with the actual data entries. At the beginning of each round there is an initial set of buckets. R. The bucket number corresponds to a page on disk that contains all possibly relevant records. Collisions are data entries with the same hash value. Explain the linear hashing. Explain static hashing technique. especially for skewed
V. Compare Extendible hashing and linear hashing. 22. What do you mean by skewed data and collision in hashing? (Or) What are the drawbacks in hashing technique? If the data is not distributed normally over the available domain for the data. Usually the size of the directory is 2d for some d. Static Hashing can answer equality queries with a single disk I/O.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
21. Explain hash based indexes. Overflow pages are required. What is the drawback of ISAM ISAM is a static structure and suffers from the problem that long overflow chains can develop as the file grows. if the primary bucket for a data entry is full.
V. because the bucket splits are not concentrated where the data density is highest. but it is still likely to be inferior to Extendible Hashing in extreme cases. A directory-based implementation of Linear Hashing can improve space occupancy. as they are in Extendible Hashing. Kanagavalli
.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
by increasing age. the merge of two runs results in a sorted instance of the file. • A widely used algorithm for performing a very important relational algebra operation. The number of passes is _log2N_+1. for example. each buffer is duplicated. If the index is unclustered. • Sorting records is the first step in bulk loading a tree index. where N is the number of pages in the file. an external sorting algorithm will almost certainly be cheaper than using the index. 32. What is the advantage of blocked I/O? In blocked I/O several consecutive pages (called a buffer block) are read/ written through a single request. The algorithm writes initial runs of B pages each instead of only one page. 34. While the CPU processes tuples in one buffer. then we can simply scan the sequence set and retrieve the records in sorted order.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
Unit IV 28. the file is broken into small sorted files called runs of the size of one page. 29.
V. • Users may want answers in some order. What is the need for sorting records? (or) explain the advantage of sorting the records. In the last pass. The two-way merge sort algorithm is an external sorting algorithm that uses only three buffer pages at any time. What do you mean by a run? Each sorted subfile is called as a run in a external merge sort. • Sorting is useful for eliminating duplicate copies in a collection of records. Blocked I/O is usually much cheaper than reading or writing the same number of pages through independent I/O requests. This technique is clearly superior to using an external sorting algorithm. Briefly explain the external merge sort algorithm. Kanagavalli Page 7
. If the file to be sorted has a clustered B+ tree index with a search key equal to the fields to be sorted by. Briefly explain the two-way merge sort algorithm. runs are paired and merged into sorted runs twice the size of the input runs. where N1 = The average length of the initial runs can be increased to 2 *B pages. • called join. The number of passes is reduced to . 33. Differentiate external sorting and clustered B+ tree index. What do you mean by double buffering? In double buffering. In addition. the algorithm merges B−1 runs instead of two runs during the merge step. The external merge sort algorithm improves upon the two-way merge sort if there are B > 3 buffer pages available for sorting. 30. The algorithm then proceeds in passes. an I/O request for the other buffer is issued. R. What do you mean by an external sorting algorithm. In each pass. reducing N1 to N1 =
31. requires a sorting step 35. An external sorting algorithm sorts a file of arbitrary length using only a limited amount of main memory. Initially.
40. using the most selective access path minimizes the cost of data retrieval. Write the block nested loops join algorithm. R. If an index contains all output attributes. The alternative ways to retrieve tuples from a relation are called access paths. What do you mean by an index only scan? It is a hash-based implementation first partitions the file according to a hash function on the output attributes. the join condition is evaluated between each pair of tuples from R and S. The most selective access path is the one that retrieves the fewest pages. An access path is either (1) a file scan or (2) an index plus a matching selection condition. there are at least two access paths.
37. ≤. General selection conditions can be expressed in conjunctive normal form. 38. =. In a nested loops join. In a subsequent step each partition is read into main memory and within-partition duplicates are eliminated.
41. or>. 39. tuples can be retrieved solely from the index. This technique is called an index-only scan. Define most selective access path of a query. Define conjunctive and disjunctive selections in a query. Conjuncts that contain V are called disjunctive. If a relation contains an index that matches a given selection. The selectivity of an access path is the number of pages retrieved (index pages plus data pages) if this access path is used to retrieve all desired tuples.
42. where op is one of the comparison operators <. the index and a scan of the data file. Write the index nested loops join algorithm.
V. where each conjunct consists of one or more terms. 43. Two tuples that belong to different partitions are guaranteed not to be duplicates because they have different hash values. Kanagavalli Page 8
. namely. ≥.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
36. The general format is attr op value. Define access path. An index matches such a selection if the index search key is attr and either (1) the index is a tree index or (2) the index is a hash index and op is equality. Define selectivity of an access path. When does a general selection condition match an index? What is a primary term in a selection
condition with respect to a given index? An index is said to match a selection condition if the index can be used to retrieve just the tuples that satisfy the condition. _=.
Kanagavalli Page 9
. When do we say two algebraic expressions to be equivalent? Two relational algebra expressions are equivalent if they produce the same output for all possible input instances. an optimizer considers a subset of all possible plans because the number of possible plans is very large. and choosing the plan with the least estimated cost. Optimizing a relational algebra expression involves two basic steps: • Enumerating alternative plans for evaluating the expression. A hash join first partitions R and S using a hash function on the join attributes. we must estimate the size of the result. Write in short about the difference between sort-merge join and hash join. 48. block nested loops join operations In a nested loops join. Define histogram and its variants A histogram is a data structure that approximates a data distribution by dividing the value range into buckets and maintaining summarized information about each bucket. 50. the range is divided into subranges such that each subrange contains the same number of tuples. 2. A block nested loops join performs the pairing in a way that minimizes the number of disk accesses 45. For each node in the tree. A sort-merge join sorts R and S on the join attributes using an external merge sort and performs the pairing during the final merge step. Define Reduction Factor. and whether it is sorted. How is the cost estimated for an evaluation plan? There are two parts to estimating the cost of an evaluation plan for a query block: 1. 47. this estimate reflects the
V. 49. Of course. Costs are affected significantly by whether pipelining is used or temporary relations are created to pass the output of an operator to its parent. How does hybrid hash join improve upon the basic hash join algorithm?
Hybrid hash join is that we avoid writing the first partitions of R and S to disk during the partitioning phase and reading them in again during the probing phase. A hybrid hash join extends the basic hash join algorithm by making more efficient. In an equiwidth histogram.
46. It is the ratio of the (expected) result size to the input size considering only the selection represented by the term. we must estimate the cost of performing the corresponding operation. For each node in the tree. Enumerate the steps in optimizing a relational algebra expression. Compare and contrast nested loops join. Typically. Reduction factor is associated with each with each term in the WHERE clause. 51. • Estimating the cost of each enumerated plan. Only partitions with the same hash values need to be joined in a subsequent step.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
44. In an equidepth histogram. The actual size of the result can be estimated as the maximum size times the product of the reduction factors for the terms in the WHERE clause. R. Several relational algebra equivalences allow a relational algebra expression be modified to obtain an expression with a cheaper plan. the join condition is evaluated between each pair of tuples from R and S. the value range is divided into subranges of equal size.
58. then t1. 55. but simplifying—assumption that the conditions tested by each term are statistically independent. Insertion anomalies: It may not be possible to store some information unless some other information is stored as well. We say that an instance r of R satisfies the FD X → Y if the following holds for every pair of tuples t1 and t2 in r: If t1. 57. and randomized plan generation. 56. since the subtree rooted at that operator’s node is itself a query block. The lossless-join property enables us to recover any instance of the decomposed relation from corresponding instances of the smaller relations. Define functional dependency A functional dependency (FD) is a kind of IC that generalizes the concept of a key. which uses probabilistic algorithms such as simulated annealing to explore a large space of plans quickly. Let R be a relation schema and let X and Y be nonempty sets of attributes in R. an inconsistency is created unless all copies are similarly updated. with a reasonable likelihood of finding a good plan.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
V. 52. The dependency preservation property enables us to enforce any constraint on the original relation by simply enforcing some contraints on each of the smaller relations. What are the problems caused by redundancy? Redundant storage: Some information is stored repeatedly. Write in short about parametric query optimization and multiple-query optimization. Update anomalies: If one copy of such repeated data is updated. the size of the result of each operator in a plan tree is estimated by using reduction factors. Deletion anomalies: It may not be possible to delete some information without losing some other information as well. Kanagavalli Page 10
. 53. How do we estimate the size of the final result of a query? The size of the final result of a query is estimated by taking the product of the sizes of the relations in the FROM clause and the reduction factors for the terms in the WHERE clause.X. Define lossless join and dependency preservation property.X = t2. Parametric query optimization. What does the rule of cascading projections state? The rule for cascading projections says that successively eliminating columns from a relation is equivalent to simply eliminating all but the columns retained by the final projection:
54. Define super key. and multiple-query optimization. Similarly. R.Y 59. which seeks to find good plans for a given query for each of several different conditions that might be encountered at run-time. Differentiate between rule based optimizers and randomized plan generators
Rule-based optimizers use a set of rules to guide the generation of candidate plans.Y = t2. in which the optimizer takes concurrent execution of several queries into account.
then X is a superkey. Kanagavalli
. where Y is the set of all attributes.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
If X → Y holds. R. and there is some subset V of X such that V → Y holds.
Kanagavalli Page 12
. Define Attribute Closure of an attribute X
Attribute closure X+ with respect to F (set of FDs).
61. called Armstrong’s Axioms. then XZ → YZ for any Z. Third normal form (3NF) requires that there are no functional dependencies of non-key attributes on something other than a candidate key. We use X. 65. B is fully functionally dependent on A if B is functionally dependent on A. can be applied repeatedly to infer all FDs implied by a set F of FDs. Differentiate between BCNF and third normal form. Write the algorithm for finding Attribute Closure of an attribute X. V. whereas BCNF insists that for this dependency to remain in a relation.
62.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
60. The difference
between 3NF and BCNF is that for a functional dependency A à B. Define second normal form. Define fully functional dependency and partial dependent. Define Third normal form. A relation is in first normal form if every field contains only atomic values. that is. R. every determinant is a candidate key. A must be a candidate key.
63. The following three rules. Second normal form (2NF) is a relation that is in first normal form and every non-primary-key attribute is fully functionally dependent on the primary key. and Z to denote sets of attributes over a relation schema R: Reflexivity: Augmentation: If X → Y. then X → Z. 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key. When do we say relation in First normal form.is defined as the set of attributes A such that X → A can be inferred using the Armstrong Axioms. A table is in 3NF if all of the non-primary key 67. 66. Full functional dependency indicates that if A and B are attributes of a relation. 64. A functional dependency AàB is partially dependent if there is some attributes that can be removed from A and the dependency still holds. Define Armstrong’s Axioms. if and only if. not lists or sets. but not on any proper subset of A. A relation is in BCNF. Transitivity: If X → Y and Y → Z. Y.
For each query in the workload. such that for each value of A there is a set of values for B and a set of value for C. However. Similarly. • For UPDATE commands. Fifth normal form is satisfied when all tables are broken into as many tables as possible in order to avoid redundancy.
V. DELETE. A list of queries and their frequencies.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
68. for each update in the workload. 73. • Which attributes are retained (in the SELECT clause). Define fourth normal form/Codd normal form Codd normal form and contains no nontrivial multi-valued dependencies. A. • Which relations are accessed. A multi-valued dependency can be further defined as being trivial or nontrivial. Once it is in fifth normal form it cannot be broken into smaller relations without changing the facts or the meaning. 72. 3. A MVD A à> B in relation R is defined as being trivial if • or • AUB=R B is a subset of A
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied. the set of values for B and C are independent of each other. 70. Define database workload A database workload description includes the following elements: 1. Performance goals for each type of query and update. 2. the fields that are modified by the update. • The type of update (INSERT. 69. Kanagavalli Page 13
. 71. What are the details to be collected for queries and updates in a database workshop. • Which attributes have selection or join conditions expressed on them (in the WHERE clause) and how selective these conditions are likely to be. Define DKNF The relation is in DKNF when there can be no insertion or deletion anomalies in the database. R. Define fifth normal form. A relation that has no join dependency. Define Multi Valued Dependency.
Multi-valued dependency (MVD) represents a dependency between attributes (for example. or UPDATE) and the updated relation. A list of updates and their frequencies. B and C) in a relation. as a fraction of all queries and updates. • Which attributes have selection or join conditions expressed on them (in the WHERE • clause) and how selective these conditions are likely to be.
These rules seek to ensure that sensitive data can never be ‘passed on’ to a user without the necessary clearance. • Inserts. In this approach each database object is assigned a security class. deletes. 81. to read or to modify). A DBMS benchmark tests the performance of a class of applications or specific aspects of a DBMS to help users evaluate system performance. Secrecy: Information should not be disclosed to unauthorized users. 80. What are the main objectives of DBMS Security? Explain with example. A privilege allows a user to access some data object in a certain manner (e. Define Discretionary Access Control. Kanagavalli Page 14
. For example. 79. and rules are imposed on reading and writing of database objects by users. Write a short note on co clustering. we can reconsider our choice of indexes and our relation schema. Using the observed workload over time. yet not allowed (obviously!) to modify them. what to index. 1. Indexes can speed up queries but can also slow down update operations.. What do you mean by physical database tuning?(Or) State the need for database tuning?
After an initial physical design. • Similarly. or privileges. an instructor who wishes to change a grade should be allowed to do so. a sequential scan of all Assembly tuples is also slower. TPC-B. Integrity: Only authorized users should be allowed to modify data. whether to create an unclustered or a clustered index. Define access control mechanism.
Co-clustering: • It can speed up joins. Other tasks include periodic reorganization of indexes and updating the statistics in the system catalogs. 78. TPC-C. • A sequential scan of either relation becomes slower.
77. and updates that alter record lengths all become slower. continuous database tuning is important to obtain best possible performance. What is a DBMS benchmark? Give examples. 2. 75. a student should not be allowed to examine other students’ grades. Availability: Authorized users should not be denied access. What are the guidelines regarding indices in physical database design? There are guidelines that help to decide whether to index. R. Well-known benchmarks include TPC-A. The two different types are discretionary and mandatory access control. and whether to use a hash or a tree index. Mention the types of the same.
V. Mandatory access control is based on systemwide policies that cannot be changed by individual users. For example. each user is assigned clearance for a security class. Discretionary access control is based on the concept of access rights.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
74. The DBMS determines whether a given user can read or write a given object based on certain rules that involve the security level of the object and the clearance of the user. and TPC-D. Define Mandatory Access Control. 3. For example. whether to use a multiple-attribute index. in particular key–foreign key joins corresponding to 1:N relationships. students may be allowed to see their grades. An access control mechanism is a way to control the data that is accessible to a given user.g. and mechanisms for giving users such privileges.
which is a table with the surprising property that users with different security clearances will see a different collection of rows when they access the same table. When a user creates a table or view and ‘automatically’ gets certain privileges. The privilege descriptor specifies the following: the grantor of the privilege. Creating new accounts: Each new user or group of users must be assigned an authorization id and a password. Even if a DBMS enforces the mandatory access control scheme discussed above.
88. they are authorization ids—and the arcs indicate how privileges are passed. Define authorization Graph. he or she can pass it to another user (with or without the grant option) by using the GRANT command. R. There is an arc from (the node for) user 1 to user 2 if user 1 executed a GRANT command giving a privilege to user 2. Note that application programs that access the database have the same authorization id as the user executing the program. What is the use of GRANT Option in the
The GRANT command gives users privileges to base tables and views. military data)
V. and whether the grant option is included. INSERT (column-name): The right to insert rows with (non-null or nondefault) values in the named column of the table named as object.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
82. Define Privilege Descriptor. Write the general format of GRANT Command. If a user has a
privilege with the grant option. information can flow from a higher classification level to a lower classification level through indirect means. 86. The presence of data objects that appear to have different values to users with different clearances is called polyinstantiation. A GRANT command has no effect if the same privileges have already been granted to the same grantee by the same grantor. including columns added later through ALTER TABLE commands. including any that are added later. 84. Authorization graph is a node in which the nodes are users—technically. A multilevel table. 83. 87. 2. Describe Covert Channel. REFERENCES without a column name specified denotes this right with respect to all columns. The privileges UPDATE (column-name) and UPDATE are similar. Mandatory control issues: If the DBMS supports mandatory control—some customized systems for applications with very high security requirements (for example. called covert channels. the granted privilege (including the name of the object involved). REFERENCES (column-name): The right to define foreign keys (in other tables) that refer to the specified column of the table object. What are the privileges granted to the user through a GRANT Command? SELECT: The right to access (read) all columns of the table specified as the object. What is the responsibility of the database administrator?
1. The syntax of this command is as follows: GRANT privileges ON object TO users [ WITH GRANT OPTION ]. a privilege descriptor with system as the grantor is entered into this table. the grantee who receives the privilege. DELETE: The right to delete rows from the table named as object. the arc is labeled with the descriptor for the GRANT command. Define multilevel and polyinstantiation. Kanagavalli Page 15
Isolation. That is. which is essentially the log of updates with the authorization id (of the user who is executing the transaction) added to each log entry. The DBA is also responsible for maintaining the audit trail. even if there are system failures. Such inference opportunities represent covert channels that can compromise the security policy of the database. Durability. R. (Executing the same program several times will generate several transactions. 3.g. Failed -. Intermediate transaction results must be hidden from other concurrently executed transactions. (Or) Explain ACID Atomicity. Execution of a transaction in isolation preserves the consistency of the database. State the ACID Properties . What are the different states of a transaction? Active – the initial state. Security in such databases poses problems because it is possible to infer protected information (such as an individual sailor’s rating) from answers to permitted statistical queries. This log is just a minor extension of the log mechanism used to recover from crashes. 90. the transaction stays in this state while it is executing Partially committed – after the final statement has been executed. can be done only if no internal logical error
V.) 92. it appears to Ti that either Tj. Unit V 91. Two options after it has been aborted: restart the transaction. Define Statistical Databases and what is the security issue in the statistical databases? A statistical database is one that contains specific information on individuals or events but is intended to permit only statistical queries. After a transaction completes successfully.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
provide such support—the DBA must assign security classes to each database object and assign security clearances to each authorization id in accordance with the chosen security policy. or Tj started execution after Ti finished. Audit trail is essentially the log of updates with the authorization id (of the user who is executing the transaction) added to each log entry. 93. 89. for every pair of transactions Ti and Tj. A transaction is defined as any one execution of a user program in a DBMS and differs from an execution of a program outside the DBMS (e. Define Audit Trail. each transaction must be unaware of other concurrently executing transactions.after the discovery that normal execution can no longer proceed.. Either all operations of the transaction are properly reflected in the database or none are. Although multiple transactions may execute concurrently. finished execution before Ti started. Consistency. the changes it has made to the database persist. Kanagavalli Page 16
. Define transaction. Aborted – after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. a C program executing on Unix) in important ways.
the shadow copy can be deleted. and the order in which two actions of a transaction T appear in a schedule must be the same as the order in which they appear in T. 98. R. 99. a pointer called db_pointer always points to the current consistent copy of the database. A complete schedule must contain all the actions of every transaction that appears in it. 100. A schedule is a list of actions (reading. What is the function of the recovery management of the database? The recovery-management component of a database system implements the support for atomicity and durability. 96.
d. 94. Define complete schedule. and db_pointer is made to
point to the updated shadow copy only after the transaction reaches partial commit and all updated pages have been flushed to disk. 97. Advantages are:
V.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
kill the transaction Committed – after successful completion. Define Schedule. a schedule represents an actual or potential execution sequence. Explain the shadow-database scheme in short. The shadow-database scheme: a. or committing) from a set of transactions. If the actions of different transactions are not interleaved—that is. Kanagavalli Page 17
. writing. Define Serial schedule. b. A schedule that contains either an abort or a commit for each transaction whose actions are listed in it is called a complete schedule. one by one—the schedule is called a serial schedule.
95. assume that only one transaction is active at a time. transactions are executed from start to finish. What is the advantage of concurrent execution? Multiple transactions are allowed to run concurrently in the system. aborting. old consistent copy pointed to by db_pointer can be used. Explain the relationship between the different states of a transaction. Intuitively. in case transaction fails. all updates are made on a shadow copy of the database.
reduced average response time for transactions: short transactions need not wait
behind long ones. 106. that is.
101.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
a. Concurrency control schemes – mechanisms to achieve isolation.
Define Concurrency Control Schemes. leading to better transaction throughput:
one transaction can be using the CPU while another is reading from or writing to the disk
. What do you mean by conflict serializable schedules? Two schedules are said to be conflict equivalent if they involve the (same set of) actions of the same transactions and they order every pair of conflicting actions of two committed transactions in the same way.Such a schedule is said to avoid cascading aborts. but also aborting a transaction can be accomplished without cascading the abort to other transactions.A node for each committed transaction in S. Reading Uncommitted Data (WR Conflicts) b. What do you mean by a recoverable schedule? What is the advantage of the same? A recoverable schedule is one in which transactions commit only after (and if!) all transactions whose changes they read commit. . Unrepeatable Reads (RW Conflicts)
c. If transactions read only the changes of committed transactions. A schedule is conflict serializable if it is conflict equivalent to some serial schedule. not only is the schedule recoverable. increased processor and disk utilization. 105. to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database
102. What is the meaning of a serializable schedule?
A serializable schedule over a set S of committed transactions is a schedule whose effect on any consistent database instance is guaranteed to be identical to that of some complete serial schedule over S. R. What do you mean by a precedence graph? Where is it used? The precedence graph for a schedule S contains: . Overwriting Uncommitted Data (WW Conflicts)
104. What are the different types of anamolies or conflicts that can occur while interleaving the transactions? a.An arc from Ti to Tj if an action of Ti precedes and conflicts with one of Tj’s actions. * Strict2PL checks allows only the schedules for which the precedence graph acyclic. 103.
If Ti reads a value of A written by Tj in S1. Every conflict serializable schedule is view serializable. For each data object A. What are the rules of strict 2Phase locking? STRICT 2 PHASE LOCKING Rules
(1) If a transaction T wants to read (respectively. Differentiate between lock upgradation and downgrading. If Ti reads the initial value of object A in S1. exclusive) lock on the object. 108. What is the difference between strict 2phase locking and 2 phase locking? The strict 2 phase locking releases the locks only when the transaction is completed whereas the 2 phase locking For 2PL the 2nd rule is replaced by “ A transaction cannot request additional locks once it releases any lock. 3. although the converse is not true.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
107. every transaction has a ‘growing’ phase in which it acquires locks.” Thus.
110.Short duration locks that are set before reading or writing a page to ensure atomic operation Unset immediately after the physical read/write operation is completed
111..g. R. Define latches Latches: . the transaction (if any) that performs the final write on A in S1 must also perform the final write on A in S2. it first requests a
shared (respectively. A schedule is view serializable if it is view equivalent to some serial schedule. it must also read the value of A written by Tj in S2. it must also read the initial value of A in S2. modify) an object. Kanagavalli
. UPDATE operation) Downgrading – The transaction initially obtains exclusive locks and then downgrades to shared locks.
V. Lock upgrade request – to upgrade shared lock to exclusive lock (e. Define Convoys. Convoys: It is the queue of transactions that is formed for want of lock to be released by another transaction that is put on hold by a preemptive OS during its process scheduling 112. 2. followed by a 'shrinking’ phase in which it releases locks 109. Define View serializable schedules. (2) All locks held by a transaction are released when the transaction is completed.
There are various choices for deciding which transaction has to be aborted. A transaction . How can it be prevented? A cycle of transactions waiting for locks to be released is called a deadlock. R. abort Tj. Define Deadlock. Each transaction is given a timestamp when it starts up. it is allowed to wait. otherwise it is aborted. higher priority transactions never wait for lower priority transactions. If a transaction Ti requests a lock and transaction Tj holds a conflicting lock. Wound-wait: If Ti has higher priority. A cycle in the waits-for graph indicates a deadlock.That is farthest from its completion
117. In the wound-wait scheme.
V. the higher is the transaction’s priority. then it is downgraded to shared lock 114.that has done the least work .
It is maintained by the lock manager to detect deadlock cycles where the Nodes denote active transaction. otherwise Ti waits. the lock manager can use one of the following policies
Wait-die: If Ti has higher priority. and an arc from Ti to Tj denotes that Ti is waiting for Tj to release a lock. If the object need not be updated. The lower the timestamp. lower priority transactions can never wait for higher priority transactions.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
How is a deadlock resolved? A deadlock is resolved by aborting a transaction that is on a cycle and releasing its locks. In either case no deadlock cycle can develop.with the fewest locks .
115. What do you mean by an update lock? Update lock – it is compatible with shared locks but not other update and exclusive locks.
116. Kanagavalli Page 20
In the wait-die scheme. Define timeout mechanism. What is a waits-for graph? Give examples.
Define Intention shared and intention exclusive locks.
120. of course) only if a split can propagate up to it from the modified leaf.. it is assumed to be in a deadlock cycle and so aborted.
122. and each page contains a set of records. Kanagavalli
. This containment hierarchy can be thought of as a tree of objects. each file contains a set of pages. at the page level). or blocks waiting for these locks to become available. where each node contains all its children. For inserts.
What are the rules to be followed in implementing concurrency control in B+ Trees? 1. The higher levels of the tree only serve to direct searches. a transaction obtains all the locks that it will ever need when it begins. Define multiple-granularity locking.
Define Lock Escalation.g. SIX lock that is logically equivalent to holding an S lock and an IX lock.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
If a transaction has been waiting too long for a lock. to start obtaining locks at the next higher granularity (e.. The idea is to exploit the hierarchical nature of the ‘contains’ relationship. R.
119. a transaction must first lock all its ancestors in IS (respectively IX) mode. 2. Multiple-granularity Locking allows to efficiently set locks on objects that contain other objects.
To lock a node in S (respectively X) mode. A database contains a set of files. 121. This is the approach for deciding the level of granularity locking by obtaining fine granularity locks (e.g. a node must be locked (in exclusive mode. Intention shared (IS) and intention exclusive (IX) locks. at the record level) and after the transaction requests a certain number of locks at that granularity. What do you mean by conservative 2PL? Conservative 2PL can also prevent deadlocks. and all the ‘real’ data is in the leaf levels (in the format of one of the three alternatives for data entries). Under Conservative 2PL. A transaction can obtain a single SIX lock (which conflicts with any lock that conflicts with either S or IX) instead of an S lock and an IX lock.
Write: If validation determines that there are no possible conflicts.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
If the ThomasWrite Rule is used. the transaction is aborted. If the Thomas Write Rule
is not used. ai occurs before aj if TS(Ti) < TS(Tj). the changes to data objects made by the transaction in its private workspace are copied into the database. like 2PL. allows only conflict serializable schedules.
124. 3. If an action violates this ordering. T is aborted in case (2) above. Ignoring outdated writes is called the Thomas Write Rule. some serializable schedules are permitted that are not conflict serializable. reading values from the database and writing to a private workspace. Validation: If the transaction decides that it wants to commit.
Define timestamp based concurrency control. and it is ensured. that is.
What is the purpose of multiversion concurrency control?
V. If there is a possible conflict.
Define Thomas write rule and justify the same.
125. R. What are the basic premises of optimistic concurrency control? The basic premise is that most transactions will not conflict with other transactions. the transaction is aborted and restarted. that if action ai of transaction Ti conflicts with action aj of transaction Tj. as illustrated by the following schedule. at execution time. Transactions proceed in three phases: 1.
126. the timestamp protocol. Kanagavalli
. 2. its private workspace is cleared and it is restarted. Each transaction can be assigned a timestamp at startup. the DBMS checks whether the transaction could possibly have conflicted with any other concurrently executing transaction. Read: The transaction executes. and the idea is to be as permissive as possible in allowing transactions to execute.
a core dump caused by a bus error) and media failures (e.
127.. Kanagavalli Page 23
Three types of conflicting actions lead to three different anomalies. and to let transaction Ti read the most recent version whose timestamp precedes TS(Ti). Explain the various conflicts in short.
129. a force approach is said to be used.g. then a steal approach is used.
What are the functions of recovery manager?
The recovery manager of a DBMS is responsible for ensuring two important properties of transactions: atomicity and durability. The goal is to ensure that a transaction never has to wait to read a database object. to achieve serializability. one transaction could read uncommitted data from another transaction. a transaction could read a data object twice with different results... assigned at startup time. the change made by the second transaction could be lost unless a complex recovery mechanism is used. It ensures atomicity by undoing the actions of transactions that do not commit and durability by making sure that all actions of committed transactions survive system crashes. When the recovery manager is invoked after a crash. If all changes made by a transaction are immediately forced to disk after the transaction commits. starting from an appropriate point in the log. Such a situation is called an unrepeatable read.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
This protocol represents yet another way of using timestamps.
Describe the responsibilities of a transaction manager.
V.. In a read-write (RW) conflict.
ARIES is a recovery algorithm that is designed to work with a steal. restart proceeds in three phases: 1. 128. Analysis: Identifies dirty pages in the buffer pool (i..
The recovery manager of a DBMS is responsible for ensuring transaction atomicity and durability. If the first transaction subsequently aborts. and restores the database state to what it was at the time of the crash. (e. 131. (e. Redo: Repeats all actions. changes that have not been written to disk) and active transactions at the time of the crash. Define ARIES recovery algorithm.g. a disk is corrupted). a core dump caused by a bus error) and media failures (e. a transaction overwrites a data object written by another transaction. It ensures atomicity by undoing the actions of transactions that do not commit and
durability by taking sure that all actions of committed transactions survive system crashes. no-force approach. Such a read is called a dirty read.g.g. In a write-read (WR) conflict. 130. R. 2.e. a disk is corrupted). In a write-write (WW) conflict. and the idea is to maintain several versions of each database object. What is the meaning of steal and force approaches?
If changes made by a transaction can be propagated to disk before the transaction has committed. each with a write timestamp.
called the log tail. sometimes called the trail or journal.
What do you mean by a log tail? The most recent portion of the log. Then. R. so that the chance of all copies of the log being simultaneously lost is negligibly small. the LSN can simply be the address of the first byte of the log record. 136. Physically. this property is required for the ARIES recovery algorithm. Further. The Write-Ahead Logging Protocol:
1. Define the rules of Write Ahead Logging. Repeating history during Redo: Upon restart following a crash. 133. LSNs should be assigned in monotonically increasing order. This way. Logging changes during Undo: Changes made to the database while undoing a transaction are logged in order to ensure that such an action is not repeated in the event of repeated (failures causing) restarts.
137. As with any record id. is kept in main memory and is periodically forced to stable storage. Define a log. it undoes the actions of transactions that were still active at the time of the crash (effectively aborting them). ARIES retraces all actions of the DBMS before the crash and brings the system back to the exact state that it was in at the time of the crash.
134. we can fetch a log record with one disk access given the LSN.
2. the log is a file of records stored in stable storage. this durability can be achieved by maintaining two or more copies of the log on different disks (perhaps in different locations). so that the database reflects only the actions of committed transactions. Undo: Undoes the actions of transactions that did not commit. in principle growing indefinitely. Must force the log record for an update before the corresponding data page gets to
disk. log records and data records are written to disk at the same granularity (pages or sets of pages). is a history of actions executed by the DBMS. What is the purpose of log record? Every log record is given a unique id called the log sequence number (LSN). 132. What are the three main principles of ARIES?
There are three main principles behind the ARIES recovery algorithm: Write-ahead logging: Any change to a database object is first recorded in the log.
#1 guarantees Atomicity. Kanagavalli
What are the contents of the update log record?
V. which is assumed to survive crashes. If the log is a sequential file. The log. Must write all log records for a transaction before commit. the record in the log must be written to stable storage before the change to the database object is written to disk.
135.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
3. #2 guarantees Durability.
An update log record that contains both before. the transaction id. (Such an undo can happen during normal system execution when a transaction is aborted or during recovery from a crash. attributes related to the transaction. this field in C is set to the value of prevLSN in U. or is aborted. (Or) What is the purpose of Compensation Log Record? (Or) What are the contents of compensation Log Record?
A compensation log record (CLR) is written just before the change recorded in an update log record U is undone. relations accessed by the transaction. 140.and after-images can be used to redo the change and to undo it. whereas a CLR describes an action taken to rollback a transaction for which the decision to abort has already been made. is committed. we never undo an undo action. and a field called lastLSN. Kanagavalli
.) A compensation log record C describes the action taken to undo the actions recorded in the corresponding update log record and is appended to the log tail just like any other log record. that is. The before-image is the value of the changed bytes before the change. which is the LSN of the next log record that is to be undone for the transaction that wrote update record U. which is the LSN of the most recent log record for this transaction. the after-image is the value after the change. What are the contents of Transaction Table? Transaction table: This table contains one entry for each active transaction. Define Compensation Log Record. the status. R. The status of a transaction can be that it is in progress. The entry contains in general.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
The pageid indicates the page id of the modified page. Differentiate between Compensation Log Record and Update Record.
Unlike an update log record. The reason is simple: an update log record describes a change made by a transaction during normal execution and the transaction may subsequently be aborted. type of the transaction.
V. the transaction must be rolled back. and the undo action described by the CLR is definitely required. a CLR describes an action that will never be undone. the length in bytes and the offset of the change are also included. list of locks held by the transaction. Thus. 138. The compensation log record C also contains a field called undoNextLSN. 139.
The entry contains a field recLSN. including in it the current contents of the transaction table and the dirty page table. 2. 145. While the end checkpoint record is being constructed. First. R. This LSN identifies the earliest log record that might have to be redone for this page during restart from a crash. Second.
143. What are the steps in analysis phase of crash recovery?
The Analysis phase performs three tasks: 1. that is. the only guarantee we have is that the transaction table and dirty page table are accurate as of the time of the begin checkpoint record. Kanagavalli
. It identifies transactions that were active at the time of the crash and must be undone. What do you mean by repeating paradigm? (Or) How does ARIES differ from other crash recovery algorithms?
V. which is the LSN of the first log record that caused the page to become dirty.
What do you mean by a checkpoint? (Or) How does the ARIES recovery algorithm use the checkpoints? What is the purpose of checkpoint?
Checkpoints are nothing but snapshots of DBMS. 3. The third step is carried out after the end checkpoint record is written to stable storage: A special master record containing the LSN of the begin checkpoint log record is written to a known place on stable storage. It determines the point in the log at which to start the Redo pass. a begin checkpoint record is written to indicate when the checkpoint starts. and appended to the log. 144. each page with changes that are not yet reflected on disk. Checkpointing in ARIES has three steps. What are the phases of restart in ARIES Recovery algorithm?
142. It determines (a conservative superset of the) pages in the buffer pool that were dirty at the time of the crash.
What do you mean by a Dirty page table? Dirty page table contains one entry for each dirty page in the buffer pool. an end checkpoint record is constructed.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
141. the DBMS continues executing transactions and writing other log records.
and the system can proceed with normal operations. This repeating history paradigm distinguishes ARIES from other proposed WAL based recovery algorithms and causes the database to be brought to the same state that it was in at the time of the crash. Kanagavalli Page 27
. If it is a CLR. This set of transactions is identified in the transaction table constructed by the Analysis phase. The redo phase repeats history by transforming the database into its state before the crash. To process a log record: 1. the undoNextLSN value is added to the set ToUndo.e.1. and further. committed or otherwise. an end record is written for the transaction because it is completely undone. as described in Section 20. For each redoable log record (update or CLR) encountered. as indicated by CLRs. What is the purpose of goal phase of crash recovery? The goal of this phase is to undo the actions of all transactions that were active at the time of the crash. the Analysis. but the recLSN for the entry is greater than the LSN of the log record being checked. or • The pageLSN (stored on the page. 150.
151. and the prevLSN value in the update log record is added to the set ToUndo. Redo scans forward until the end of the log. until ToUndo is empty.
146. and the CLR is discarded. if the undoNextLSN is null.
What are the steps in redo phase of crash recovery? The Redo phase begins with the log record that has the smallest recLSN of all pages in the dirty page table constructed by the Analysis pass because this log record identifies the oldest update that may not have been written to disk prior to the crash. The Undo phase undoes actions by loser transaction. What do you mean by Loser Transactions? Transactions that were active at the time of crash are called as loser transactions. Starting from this log record. After a system crash. Redo. Write in short the working sequence of ARIES algorithm. that is. R. When the set ToUndo is empty. the Undo phase is complete. If it is an update record.. the actions described in the CLRs are also reapplied. Restart is now complete.
147. transactions that are aborted since they were active at the time of the crash. ARIES reapplies the updates of all transactions. which must be retrieved to check this condition) is greater than or equal to the LSN of the log record being checked. 2. All actions of losers must be undone. most recent) LSN value in this set and processes it. The action must be redone unless one of the following conditions holds: • The affected page is not in the dirty page table.
148. Further. ARIES handles subsequent crashes during system restart by writing compensating log records (CLRs) when undoing actions of aborted transaction. Undo repeatedly
chooses the largest (i. if a transaction was aborted before the crash and its updates were undone. to effectively abort them. or • The affected page is in the dirty page table.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
During the Redo phase. these actions must be undone in the reverse of the order in which they appear in the log. and Undo phases are executed. Redo checks whether the logged action must be redone. CLRs indicate which actions have already been undone and prevent undoing the same action twice.1. and the undoNextLSN value is not null. a CLR is written and the corresponding action is undone. How do we recover from media failure? V. What is the sequence of actions in Undo phase of crash recovery? The set of lastLSN values for all loser transactions is called as ToUndo.
. The procedure of copying the database is similar to creating a checkpoint.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
To be able to recover from media failure without reading the complete log copy of the database is taken periodically.
22. Explain two way merge sort algorithm with neat diagram and an example. 17.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
Big Questions Unit III 1. 8. Explain how the ISAM index structure handles the deletion of a data item with the algorithm and example. Write in detail about the encryption technology and how it can be used for database security?
V. 23. 6. Explain in detail about the discretionary access control. 18. 12.
13. Explain the effect of double buffering and blocked I/O in the performance of sorting algorithms. Define Index. 10. Write a detailed note on DBMS Benchmarks. 20. Explain the extendible hashing method with example and neat diagrams. Explain the working of replacement sort with neat diagram and example. Explain the linear hashing in detail with example and neat diagrams. Explain how the data is inserted into the ISAM index structure with the algorithm and a neat diagram 4. Explain the ISAM index structure in detail with examples illustrating insertion and deletion with suitable diagrams 3. Explain in detail about the mandatory access control. 11. Unit IV 9. Explain hash join in detail with neat diagram 16. Explain the sort-merge join in detail with cost analysis. Kanagavalli Page 29
. Explain the nested join operation in detail. 7. 14.
15. with example along with the advantage and disadvantage of each type. Write in detail about normalization and various normal forms in DBMS. Explain in detail about various types of indices. What are factors to be considered for physical database design? 19. R. Explain in detail static hashing with example and neat diagrams. 5. Write in detail about query optimization. 2. Differentiate between static and dynamic index structures with suitable examples. Explain the algorithm for implementing external merge sort with neat diagram and an example. What is the operation involved in physical database tuning? 21.
28. 25. Explain the usefulness of each. Explain the sequence of actions in redo and undo phases of crash recovery. Explain the crash recovery process in detail. Write in detail about schedules and its significance in concurrency control. Discuss the concurrency control without locking.
V. 29. R. Explain the concept of deadlock handling with deadlock prevention. 27. Write in detail about the Log Table and its significance in crash recovery. Kanagavalli
. Explain the importance of checkpoints in crash recovery. Explain the ACID properties of transaction. 31.SSEC\MCA\DBMS Question Bank\2010-2013 Batch
Unit V 24. 32. detection and recovery. 30. 33. 34. Describe the concurrency control based on locking. Draw the state diagram of a transaction and explain 26. Write in detail about the working principle of ARIES algorithm.