Professional Documents
Culture Documents
KG) = (Alexander), PG) = addres of block 3> « el t i o - dam Fak _oveciow poirtoe = 0 = = ise = L eo ae | or) Tis Mes = Maa tea Mso-1 Meo ‘igure: usratog internal bathing data striae - (Gy Array of pasion for sen hasog (0) Callin resttlen by hala of ress. ‘Figure 6 iastrates this primary index. The total nuraber of entries in the indéx will be the ‘same asthe number of disk blocks in th ondexed datafile, The firs record in each block of (he data file is called the anchor record of the block, or simply the block anchor (a scheme ‘Simla to the one described here can be used, with the last recoed in each block, rather than ‘the first, a the block anchor! A primary index isan example of what is called a nondense index because it includes an enuy foreach disk block ofthe data fle auhec than for every ‘econd in the datafile. A dense index, onthe other hand, contains an entry for every record i, thefile, : fe ‘The index file for a primary index needs substantially fewer blocks than the data fle for two ‘easons. Firs, there are Fesver index entris than there are records inthe data fle because an t entry exists for each whole block of the data ile rather than for each record. Second, eich : index entry is typically smaller in size than a data record because it has only wo felds, 60 48‘moro index entriss than dam recs wil tin one block. A binary seach onthe index ile, ‘Fe Organ Fer will henee require fewer bloke accesses than a binary search on the data fie, ‘Conventional DBMS ONTAFRE ‘igure: Pinay ndoxon (be oderog hey Ml of te Mle sow ie MasreS ‘Arecord whose primary key value isK will be in the block whose address is P6), where Kis Ks G+ 1) Theith block in the daa file contains all such records because of the physical ordering of the file records on the primary key fitld, we do a binary search on the index file to find the appropriate index enuy i, uhenseaeve the daca fife block whose address is PQ). Notice thatthe above formuta would not be comect if te datafile was ordered on 2 ronkey field that allows mutiple records to have the sane ordering field value. In that case , the safe index value as that in th block anchor could be repeated in tne last. records oF he previous block. Example] ilustrates the saving i block accesses when using an index to search fora recor. Example 1: Suppose we have an ordered file with r= 30,000 seconds stored ona disk with block size B= 1024 bytes. File records are of fixed size and unspanned with rood length R = 100 bytes. The blocking factor for the file would be bir L(B/R). orate =10 records per block. The number of blocks needed for the Geis b= [ets fanpoato)1= 49ntaeuctory Conepi of, ‘DataBase Management System inary search on the datafile would need approximately [og b)1= 12 block access. ood bios, A Fdog,3000)1 = [Now suppose the oriesng ky field ofthe fle ie V = 9 bytes lng, a block pointers ‘bytes long, and we construct primary index forthe fle. The size ofeach index enty is R, © +6)= 15 bytes, so the blocking factor forthe index is bir, =L(B/R)) =L (1024/15) centres per block. ‘The total nantes of index enes ris qual tothe number of blocks inthe data fit, which if 000. Menunberof tock eter Wenders ence =f) feat] = 45 ‘blocks. To perform a binary search on the index fle would necdl (log,b)] =I (og,45)] = 6 ‘lock scoessés. To scarch fora record using the inde, we nocd one addtional block access to the datafile for a total of 6 + 1 =7 block accesses - an improvernent aver binary search on ‘the data ite, which required 12block accesses. . -A major problem witha primary index -as with any ordered fils is insertion and deletion of records. With a primary index, ihe problem is compounded because if we atempt to insert a ‘recordin its correct pasition inthe data file, we not oly have to move records to make space for the new record butalso have to change some index entries because moving records will ‘change the anchor records of some blocks. We can use an unordered overflow file. Another Possibility sto use a inked list of avertow records for each block in the date file. We can ‘ketp the records within each block and its overflow linked ‘st sorted to improve retieval {Ume. Record delcton can be handled using deletion markers, ‘Clustering Indexes {records of fle are shysically ordcred ona monkey Feld hat doesnot have x distinct value {or each record, that fed is called the clustering field of the ile. We can crete a diferent ‘ype of index, called a clustering index, wo speed up retrieval of records that have the same gore custriag Index on toe DEPINUMBER ordering fd of EMPLOYEE feValue fr the clustering field. This differs from a primary indox, which requires tha the Fe Organisation For" ordering field of the dae file have a distinc value foreach record, : (Conventional DIMAS ‘Aclastring index sls an ordered le with two Field: the it edi ofthe same type ee the clsering eld of toda a ante secon eld is a block pont. ere is Oe ety in the clustering index for each distinct vatue of the clustering field, containing that value anda point ke Fit block in he dua le hatha a record with hat value fori DATARLE - usTeaws pa) DEPTMLNGER NAME SSM_JOB BIRTHDATE. SALARY od dpe dpi: up poi slr: rug apie ek pane ‘igore8 :Cusierng index wth separate locks foreach group of records ‘wih the sre value or the estering 31areductary Conenteet __ Clustering field. Figure 7 shows an example of a data file with a clustering index, Nole the ase Management $ysiem ‘econd insertion and record deletion still canse considerable problems because the date records are physically ordered. To alleviate the problcat of insextion, it is commoni to reserve: a whole block for each value of the clustering fleid; ali records with that value are placed in the block. If more than one block is needed to store the records for a particutar value, sddigonal bloss are aloaiod ad inked ogee, This makes insertion and dleion Telaivelysuplghorard Figure 8 shows thisschee. Aclustering index is another example of a nondense index because it has an entry for every ‘distinet value of tho indoring field rather than for evrey ecard in the file, Secondary Indexes ‘A secondary index also is an ordered file with two fields, and, asin the other indexes, the ‘second field is a pointer toa disk block, The first field is of the sane datatype as some ‘nonordering field of the datafile. The field cn which the secondary index is constructed is called an indexing field of the fie, whether its values ae distinct for every record or not. "There.can be many secondary indexes, ané hence indexing fields, for the same fle. ‘We first considera socondary index on 3 key field -a field having a distinct value for every ‘record in the data il, Such a Geld is sometimes called a secondary Key for the file. In this case there is one index entry for each recon in the daa fle, which has tho valuo ofthe secondary key forthe record and a pointer to th block in which the recardis stored. A secondary index on akey frld is a dense index because it contains one entry foreach record inthe daia fle, ‘We again refer to the two fcld values of index entry i as X(), PQ). The enuries are ordered bby value of K(), 80 we can nse binary search on ths Index. Because the records ofthe data file are not physically ordered by values of the secondary key field, we cannot se block (exterior) . Danna : Fab ‘igure: Adensesrcndary indecan anocorderighey Addetefle —~anchors. Thars why an index enmry is created for each record in the data fite rather than for cach block as in the case ofa primary index. Figure 9 illustrates a secondary index on a key auribute of a datafile. Notice that in igure 9 the ‘pointers PQ) in the index entries are block ‘pointers, nat record pointers. Once the appropriate block is transfered to main memory, a search for the desired record within the block can be cartied out. ‘A secondary index will usualy need substantially more storage space than a primary index ‘becaue of its larger number of entries. However, the improvemnent in search time foran arbierary record is much greater fora secondary index than itis fora primary index, because ‘we would have to doa lear search on the datafile ifthe secondary index did not exist. For ‘a primary index, we could sill use binary search on the main fie even i the index dié-not ‘exist because tho records are physically rdored by the primary key field. Example 2 ‘MTustrmtes the improvement in mumber of blocks accessed when using a secondary index 10 search fora record. ‘Example 2 : Consider the fils of Example 1 with = 30,000 fixed-length records of size R- 100 bytes stored on a disk with block size B = 1024 bytes The file has b = 3000 blocks as ‘calculated in Example 1, To do a fincar search on he file, we would require b/2= 3000/2 = 1500 block sccesses on the average, ‘Suppose we construct a secondary index on a nonordering key Feld of the file that is V=9 ‘byes long. As in Example 1, a block pointer is P= 6 byles long, so cach index enzy is G+ 6)= 15 byiss, and the blocking factor for he index is bi, (BAR| =L(1024/15) Gxcouteqer ick Ine dense sandy non sch Dt a amex finda ‘attics i , is equal 1 the number of records in the datafile, which is 30,000. The rember of blocks needed forthe. index is hence b= (+/bft) = 30,000/68) = 442 blocks. ‘Compare this tothe 45 blocks needed by the nondonse primary index in Example 1. ‘Apinary search on this secondary index neo (log,b) = (105,442) = 9 Llock accesses. To search fora record using the index, we need an ditional block access to he datafile for a inal of9 + 1= 10 block accesses vast improvement over the 1500 block accesses needed on theaversge foratinear search. ene oxnrte Rey eas EFRWUEER WANE S5¥ 400 BIATIONTE SMARY [Fpure 10 A steonday Index ona one Met implemented asin one leva ofndireton shat Indo evtrksare ed lengli and nave uni eld sues ile Organvaten For ‘Conventenal DBMS 3Intrtucary Consens ‘We dan also create socondary index ona nonkey fel of Gil. Ia this cae mumerous Dota ase Managers T= records in the datafile cam have the same value forthe indexing field. There are several ‘options for implementing such an index: . . © Option 1 is to include several index entries with the same K(i) value one for each ‘record. This would be adense index. 4¢) Option 2istobave variabte length rons fer the index ees; with a repeating field for the pointer. We keep a list of pointers (j,1).-.P(K) in the indexeatry for () one pointer o each block tat contains a record whase indexing field value ‘equals K(. In either option 1 or option 2, the binary search algorithm on the index ‘must be modified appropriately. © Option 3, which is used commonly isto kegp the index ents themselves st 2 ‘xed length and have a single eamy foreach index fed value, butcreale an exia Jevel of indvecton wo handle the mubipt points, In his scheme, whichis nondlenee the ponter PG) in index entry (), PG) points toa block of record poinirs cach record pointer in that block points to ene of the datafile reors witha value X) forthe indexing field, Kf some valve K() has loo many recerds, 50 tha their- seco pointes cannot Gina single disk block, a inked lis of blocks canbe used. ‘This technique is illastrsted in igure 10, Retoval via the index requires an additonal block access because ofthe extra Tove, but the algorithms for searching the index and, more imponant, for insertion of new recon in Une datafile are straighuforvard, In aden, rivals on complex sletion conditions may be handled by efeting to the pointers without heving to retrieve many unecessary file ecords. "Notice that a secondary index provides a logical ordering onthe records by the indexing ‘Held. If we access the records in order of the entries in the secondary index, we get them onder of the indexing Feld. ‘Multilevel Indexing Schemes : Basic Technique In afull indexing scheme, thc address of every record is maintained inthe iidex. For a small {le, this index would be small and can be processed very efficiently in main memory. For 2 se] aa Mar rT [s908 | iam 908 | oe eta es smi ite 54 igure IL: Herrehyoftndexeslarge file, the index’s size would pose problems. Itis possible to create a hierarchy of indexes Fle Organisation ror ‘with the lowest level index pointing to the records, while the higher level indexes point to the ‘Conventional DBMS indexes below them (Ggure 11), The higher level indices are small and can be moved to mai ‘memory, allowing the search to be localised to one of the larger lower leve! indices. “The lowest level index consists of thepar for cach record in the files his i costly in enms of space, Updates of records require changes to the index ile as well a the datafile. Insertion ofa record requis tha its pair be inserted inthe index at the coirect point, w!*~ deletion of a record requires thatthe } ‘aoa ec Craton ek Figure 12 Qrertow of reord “Multiple records belonging to the same logicaF area may be chained to maintained logical sequencing, When rocords are foree into the overflow areas asa cesultof insertion, the {insertion process is simplified, but the search time is increased, Deletion of records from inder-sequemial files creates logical gaps; the records ae not physically remaved but only Magged as having boon deleted. If there were a number of dotetions, we may have a great ‘amount of unused space. ‘An index-sequential fie is therefore made up ofthe following components: 1. primary data storag are. In certain systems this area ay have used spces ribedded within ito permit dition of records, may ls include records at have ‘been marked as having been deleted. 2. Overflow area(s). This permits the addition of records to the files. A number of schemes ‘exist fo the incorporation of records in these orcas ino the expected logical sequence. 3, Anterarey of indices. Ina random cnauiry oc update, the physical location ofthe desied record is obtsined by accessing these insices. “The primary dam area consis the rocords writen by the uses" programs. The records we vrriten in deta Blocks in astonding kay sequence, These daa locks are in tm stored in {eendingsoquence in the primary data area. The data Blocks are soquenced bythe highest ey ofthe logical records canine in em, ‘Thefe are several approaches fo structuring both the index and sequential data portion of an xed sequential file, The most common approach is to build the index as alec of key values. The ree is typicaly a variation on B-rve which we will discuss ater. The other ‘common approach is to build the index based on the physica layout of tho data in storage. “The important technique for building index based on the physical layout ofthe data in lorage 8 ISAM Cndex sequential acess method) which we will ius. sIntraductery Concept of Daa Bae Mapagerent Sytem 36 Physical Data Organisation Under ISAM ‘When a record is stored by ISAM, its record key must be one of the field in the record. The records themselves are firs sored by record key into ascending order before they are stored ‘on one oF toe disk drives, [SAM will always maintain the rocords in this sorted order. Each record is stored on one of te Lacks of a disk. Those records that fllow it in sorted sequence. ae placed direey aftr it oo the same track or, i room does not permit ace spilled over onto the noxt trac in the same cylinder. In other words, thoy are dropped down tothe next platter ‘surface. The arm does not move; looking dowmwared, the next head is selocted eleronicaly. ‘Since the tracks na cylinder are TabellodO, 1, 2, nn the ecOrds that follow thase-on track ‘Lane placed on wack 2, Track Gis dhe next fie cylinder. The eylindes are also labelled 0, 1, 2a Figure 13 shows two cylinders of records, but only their keys are shown, Note that thé Keys 176 in ascending sequence Uxoughovt ther storage on both cylinders. We have not shown. ‘ecord 0 on either cylinder, us this is used by ISAM for contrl. OF course, the namber of lacks on each cylinder is 2 function of the size ofthe dik pack. ‘When ISAM rerioves a record, it needs to know the cylinder, the wack address, and the record key. These are the componenis that must make up te directory enti for tho ISAM. file. In ISAM 0 directory i called an index. For example, if a directory entry for record 1500 ‘gave cylinder 9 and track 3, then ISAM would select eylinder9, The read head associated ‘with tack 3 would then be activated. The bottom side of the top plaer is usually track 0 because the top being exposed is subject lo damage; Uerofore, the read head selected would ‘be that forthe op side of the third plauer, as shown in figure 14, Of course, the required ‘ecord might be one of the many records stored on yack 3, Rotation ofthe drive would ‘eventually bring the required record under the read head. The desiod record is identified by ins ecord key. ‘Because the records in an ISAM ile are kept in sored order by record key, it is not necessary to have.a directory entry for evry single record Ti sullcient to know tis largesurecord key on every track of w fil. For example, suppose the largest key on the tick 3 is 100 anid the largest on track 4 is 200. A record with Key 175, iit exists in te file stall, mast be on tack 4, Itcannot be on track 3 as the largest Key oa that tack is 100, ‘The most obvious place to kecp the dieetory for each cylinder onthe file is, of course, on the cylinder isle, and itis cn tack O ofeach cylinder tha ISAM keeps its directory. This directory is known as aac index, it contains the Largest key on overy track and ths hardware address ofthat wack, Figare 15 shows atypical wack inde for one cylinder of a file. In his eylinder, for example, 400 i shown to be te largest Key on Wack 3 and 700 the largest key on the cylinder, Later we will se tat his directory sslighlly more complicated. ‘This wil be clarified when we discuss how ISAM keeps tackof records tha are added to Tack 1[s0 [eo | 70 | 30-[ 90 2400 [ato |¥20"[i80 [140 a[as0 [rep "cf 19[ $50 | 960 | 970 [ ¢80 | 900 L's] 20 [200-1010 [1020 [1050 |1080 [-:}. + [1080 [7100 [r2a0 frase [rsoo 2 [1948 | 1660 [1600 |1700_|h719 1900 [20003 rol pear igure 1 : Record Storage