You are on page 1of 8

Ques-1- Hashed indexing and binary tree indexing.

Ans - Hash Indexing


A Hash table is one of the simplest index structures which a database can implement. The major components of a hash index are the "hash function" and the "buckets". Effectively the DBMS constructs an index for every table you create that has a PRIMARY KEY attribute, like: CREATE TABLE test ( id INTEGER PRIMARY KEY , name varchar (100) ); The algorithm splits the places which the rows are to be stored into areas. These areas are called buckets. If a row's primary key matches the requirements to be stored in that bucket, then that is where it will be stored. The algorithm to decide which bucket to use is called the hash function. For our example we will have a nice simple hash function, where the bucket number equals the primary key. When the index is created we have to also decide how many buckets there are. In this example we have decided on 4.

Figure: Hash Table with no collisions Now we can find id 3 quickly and easily by visiting bucket 3 and looking into it.

Figure: Hash Table with collisions We have had to put more than 1 row in some of the buckets. This is called a hash collision. The more collisions we have the longer the collision chain and the slower the system will get. For instance, finding id 6 means visiting bucket 2, And then finding id 2, Then 10, And then finally 6.

B-Tree Indexing
The B-Tree Index is the default index for most relational database systems. The top most level of the index is called the root. The lowest level is called the leaf node. All other levels in between are called branches. Both the root and branch contain entries that point to the next level in the index. Leaf nodes consisting of the index key and pointers pointing to the physical location (i.e., row ids) in which the corresponding records are stored. Inserting a new key- It is done almost the same way as insertion of a new key into a B-tree. When a leak node is split into two nodes, a copy of low-order key value from the rightmost node is promoted to be the separator key value in the parent node. The new node also must be inserted in the linked list of the sequence set. Deleting a key- It is easier as compared to a B-Tree. When a key value is deleted from a leaf, there is no need to delete that key from the index of the tree. The key value still can direct searches to proper leaves. Searching a B Tree- It terminates in a node in the sequence set. If there is a key in the index that matches the sought key, the preceding pointer is followed until the correct leaf is reached. Also not every key in the index set need to appear in the sequence set, since the deleted keys are retained in the index set.

B- Tree Indexing

Ques-2- What is Application of Trees? Ans - Applications of Trees


Unlike Array and Linked List, which are linear data structures, tree is hierarchical (or non-linear) data structure. 1) One reason to use trees might be because you want to store information that naturally forms a hierarchy. For example, the file system on a computer: Family Tree / ----- Grand Father / \ .. Father / \ st Child (1 Born) Child (2nd Born) Hence we can have many such hierarchical structures like Family Tree, Business Corporate Structure, and Government Structure. 2) If we organize keys in form of a tree (with some ordering e.g., BST), we can search for a given key in moderate time (quicker than Linked List and slower than arrays). Self-balancing search trees like AVL guarantee an upper bound of O (Logn) for search. 3) We can insert/delete keys in moderate time (quicker than Arrays and slower than Unordered Linked Lists). Self-balancing search trees like AVL guarantee an upper bound of O (Logn) for insertion/deletion. 4) Development Software Technique Modularization. It aims at dividing complex software system into functionally independent modules. So, that each can be developed independently. And these modules can be further divided thus forming a Tree. This reduces the time to develop software. 5) Heaps: In computer science, a heap is a specialized tree-based data structure that satisfies the heap property: if B is a child node of A, then key (A) key (B). This implies that an element with the greatest key is always in the root node, and so such a heap is sometimes called a max-heap. (Alternatively, if the comparison is reversed, the smallest element is always in the root node, which results in a min-heap.) There is no restriction as to how many children each node has in a heap, although in practice each node has at most two Heaps are crucial in several efficient graph algorithms such as Dijkstra's algorithm, and in the sorting algorithm heap sort. 6) Binary Space Partition - Used in almost every 3D video game to determine what objects need to be rendered. 7) Binary Trees - Used in almost every high-bandwidth router for storing router-tables.

Ques-3- Write a short note on different types of file organisations? Ans - (a). Sequential File Organization
1. A sequential file is designed for efficient processing of records in sorted order on some search key. Records are chained together by pointers to permit fast retrieval in search key order. Pointer points to next record in order. Records are stored physically in search key order (or as close to this as possible). This minimizes number of block accesses. 2. It is difficult to maintain physical sequential order as records are inserted and deleted. Deletion can be managed with the pointer chains. Insertion poses problems if no space where new record should go. If space, use it, else put new record in an overflow block. Adjust pointers accordingly. Problem: we now have some records out of physical sequential order. If very few records in overflow blocks, this will work well. If order is lost, reorganize the file. Reorganizations are expensive and done when system load is low. 3. If insertions rarely occur, we could keep the file in physically sorted order and reorganize when insertion occurs. In this case, the pointer fields are no longer required. 4. Advantages Easy to handle Involves no overhead Can be stored on tapes as well as disks Well suited for batch-oriented applications. Records in a sequential file can be of varying lengths 5. Disadvantages Records can only be accessed in sequence Does not support updating operation in place Does not support interactive applications

(b). Random or Relative File Organisation


1. Basic Features Records are read directly from or written on to the file. The records are stored at known address. There is a predictable relationship between the key and record's location in file Address is calculated by applying a mathematical function to the key field. Records do not necessarily appear physically in sorted order by their keys A random file would have to be stored on a direct access backing storage medium e.g. magnetic disc, CD, DVD 2. Other types of access of a random file Random file can be accessed sequentially, but the keys may not be in logical sequence. It can be accessed serially when input to a sort/merge utility. Serial access can also be made by a program while generating mailing labels for each customer 3. Advantages Records can be accessed out-of-sequence, randomly Well suited for interactive (on-line) applications Support updating operation in place Concurrent processing is possible. Speed of record processing is very fast. 4. Disadvantages Can only be stored on disks Involve more overhead in the form of maintenance of indexes Handling bit complex as compared to sequential files Records can only be of fixed length Does not fully use memory locations More security and backup problems

(c) Indexed or Linked File Organisation


1. Basic Features Each record of a file has a key field which uniquely identifies that record. An index consists of keys and addresses. An indexed sequential file is a sequential file (i.e. sorted into order of a key field) which has an index. A full index to a file is one in which there is an entry for every record. Indexed sequential files are important for applications where data needs to be accessed..... 1. sequentially 2. Randomly using the index. An indexed sequential file can only be stored on a random access device e.g. magnetic disc, CD. 2. Advantages Records can be accessed sequentially or randomly Supports interactive as well as batch-oriented applications Support updating operation in place Faster than sequential 3. Disadvantages Extra storage space required Can only be stored on disks Involve more overhead in the form of maintenance of indexes Handling bit complex as compared to sequential files Records can only be of fixed length

(d) Inverted File Organisation


In file organization, a file is indexed on many of the attributes of the data itself. The inverted list method has a single index for each key type. The records are not necessarily stored in a sequence. They are placed in the data storage area, but indexes are updated for the record keys and location. Here's an example, in a company file, an index could be maintained for all products, and another one might be maintained for product types. Thus, it is faster to search the indexes than every record. These types of file are also known as "inverted indexes." Inverted list files use more space and the storage devices get full quickly with this type of organization. The benefits are apparent immediately because searching is fast. However, updating is much slower. Content-based queries in text retrieval systems use inverted indexes as their preferred mechanism. Data items in these systems are usually stored compressed which would normally slow the retrieval process, but the compression algorithm will be chosen to support this technique. When querying a file there are certain circumstances when the query is designed to be modal which means that rules are set which require that different information be held in the index.

Ques-4- Write a short note on records? Ans - Records:


A record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a structure. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members. For example, a date may be stored as a record containing a numeric year field, a month field represented as a string, and a numeric day-of-month field. Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type. A record type is a data type that describes such values and variables. Most modern computer languages allow the programmer to define new record types. The definition includes specifying the data type of each field and an identifier (name or label) by which it can be accessed. Records can exist in any storage medium, including main memory and mass storage devices such as magnetic tapes or hard disks. Records are a fundamental component of most data structures, especially linked data structures. Many computer files are organized as arrays of logical records, often grouped into larger physical records or blocks for efficiency. The parameters of a function or procedure can often be viewed as the fields of a record variable; and the arguments passed to that function can be viewed as a record value that gets assigned to that variable at the time of the call. Also, in the call stack that is often used to implement procedure calls, each entry is an activation record or call frame, containing the procedure parameters and local variables, the return address, and other internal fields. An object in object-oriented language is essentially a record that contains procedures specialized to handle that record; and object data types (often called object classes) are an elaboration of record types. Indeed, in most object-oriented languages, records are just special cases of objects. A record can be viewed as the computer analogue of a mathematical tuple. In the same vein, a record type can be viewed as the computer language analogue of the Cartesian product of two or more mathematical sets, or the implementation of an abstract product type in a specific language.

You might also like