Database I

Methodology Physical Design


Physical Database Design  

Throughout the processes of conceptual and logical database designs and the normalization, the primary objective has been the storage efficiency and the consistency of the database In the physical database design, however, the focus shifts from storage efficiency to the efficiency in execution

Physical Database Design (Cont.) 

The physical DB design involves:  

Transforms logical DB design into technical specifications for storing and retrieving data Does not include practically implementing the design however tool specific decisions are involved 

The Physical design requires the following input 

Normalized relations Definitions of each attribute (means the purpose or objective of the attributes)


Physical Database Design (Cont.) 

Descriptions of data usage (how and by whom data will be used) Requirements for response time, data security, backup etc. Tool to be used 

Decisions that are made during this process are: 

Choosing data types Deciding file organizations Selecting structures Preparing strategies for efficient access



De-normalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access De-normalization process is applied for deriving a physical data model from a logical design In logical design we group things logically related through same primary key In physical database design fields are grouped, as they are stored physically and accessed by DBMS

De-normalization (Cont.)  

We should be aware that each new RDBMS release usually bring enhanced performance and improved access options that may reduce the need for De-normalization A fully normalized database schema can fail to provide adequate system response time due to excessive table join operations


De-normalization (Cont.) 

De-normalization Situation 1:   

Merge two Entity types into one with one to one relationship Even if one of the entity type is optional, so joining can lead to wastage of storage, however if two accessed together very frequently their merging might be a wise decision So those two relations must be merged for better performance, which have one to one relationship

De-normalization (Cont.) 

De-normalization Situation 2:   

Many to many binary relationships mapped to three relations Queries needing data from two participating relations need joining of three relations that is expensive Join is an expensive operation from execution point of view 

Consider the many to many relationship b/w EMP, PROJ and WORK
EMP (empID, eName,pjId,Sal) PROJ (pjId,pjName) WORK (empId.pjId,dtHired,Sal)


De-normalization (Cont.)   

So now if we by de-normalizing these relations and merge the WORK relation with PROJ relation But in this case it is violating 2NF and anomalies of 2NF would be there But there would be only one join operation involved by joining two tables, which increases the efficiency EMP (empID, eName,pjId,Sal) PROJ (pjId,pjName, empId,dtHired,Sal)

De-normalization (Cont.) 

De-normalization Situation 3:  


In 1:M situation when the ET on side does not participate in any other relationship, then many side ET is appended with reference data rather than the foreign key In this case the reference table should be merged with the main table Consider STUDENT and HOBBY relations One student can have one hobby and one hobby can be adopted by many students Here hobby can be merged with the student relation Thus redundancy of data would be there, but there would not be any joining of two relations, which will have a better performance 10


Partitioning splits same relation into two Aims of data partitioning in database are to  

Reduce workload (e.g. data access, communication costs, search space) Balance workload Speed up the rate of useful work (e.g. frequently accessed objects in main memory) Horizontal Partitioning Vertical Partitioning 

There are two types of partitioning: 


Partitioning (Cont.) 

Horizontal Partitioning   

Table is split on the basis of rows, which means a larger table is split into smaller tables The advantage of this is that time in accessing the records of a larger table is much more than a smaller table Range Partitioning  

In this type of partitioning range is imposed on any particular attribute For Example for those students whose ID is from 11000 are in partition 1 and so on


Partitioning (Cont.) 

Hash Partitioning  

A particular algorithm is applied and DBMS knows that algorithm So hash partitioning reduces the chances of unbalanced partitions to a large extent In this type of partitioning the values are specified for every partition So there is a specified list for all the partitions 

List Partitioning  


Partitioning (Cont.) 

Vertical Partitioning    

Vertical partitioning is done on the basis of attributes Same table is split into different physical records depending on the nature of accesses Primary key is repeated in all vertical partitions of a table to get the original table Consider the Student relation STD (stId, sName, sAdr, sPhone, cgpa, prName, school, mtMrks, mtSubs, clgName, intMarks, intSubs, dClg, bMarks, bSubs)


Partitioning (Cont.) 

We can partition this relation vertically as under STD (stId, sName, sAdr, sPhone, cgpa, prName) STDACD (sId, school, mtMrks, mtSubs, clgName, intMarks, intSubs, dClg, bMarks,bSubs)


Data Storage Concepts 

Physical Storage Media Storage media are classified according to following characteristics: 

Speed of access Cost per unit of data Reliability Many disk that look as a single disk to OS but have better performance and better reliability RAID disk drives are used frequently on servers RAID have the property that the data are distributed over the drives to allow parallel operations

RAID ± Redundant Array of Inexpensive Disks  

Data Storage Concepts (Cont.)   

Fundamental to RAID is "striping", a method of concatenating multiple drives into one logical storage unit Striping involves partitioning each drive's storage space into stripes which may be as small as one sector (512 bytes) or as large as several megabytes The type of application environment, I/O or data intensive, determines whether large or small stripes should be used

Data Storage Concepts (Cont.) 


Simple Striping Virtual single disk is divided up into strips of k sectors each Since no redundant information is stored, performance is very good, but the failure of any disk in the array results in data loss


Data Storage Concepts (Cont.)

1 5 9

2 6 10

3 7 11

4 8 12

Note: This example is a basic virtual drive where each element depicted as a disk is a physical disk

Data Storage Concepts (Cont.) 


RAID Level 1 provides redundancy by writing all data to two or more drives The performance of a level 1 array tends to be faster on reads and slower on writes compared to a single drive, but if either drive fails, no data is lost This level is commonly referred to as mirroring


Data Storage Concepts (Cont.)
1 2 3 1¶ 2¶ 3¶ 1¶¶ 2¶¶ 3¶¶


Data Storage Concepts (Cont.) 


For reliability simple parity check code is used Parity bit is stored on separate disk RAID Level 4 stripes data at a block level across several drives, with parity stored on one drive The performance of a level 4 array is very good for reads (the same as level 0) Writes, however, require that parity data be updated each time


Data Storage Concepts (Cont.) 


RAID Level 5 is similar to level 4, but distributes parity among the drives This can speed small writes in multiprocessing systems, since the parity disk does not become a bottleneck 


RAID-0 is the fastest and most efficient array type but offers no fault-tolerance RAID-1 is the array of choice for performancecritical, fault-tolerant environments RAID-2 is seldom used today since ECC is embedded in almost all modern disk drives

Data Storage Concepts (Cont.)   

RAID-3 can be used in data intensive or single-user environments which access long sequential records to speed up data transfer. However, RAID-3 does not allow multiple I/O operations to be overlapped RAID-4 offers no advantages over RAID-5 and does not support multiple simultaneous write operations RAID-5 is the best choices in multi-use environments which are not write performance sensitive. However, at least three and more typically five drives are required for RAID-5 arrays

Sign up to vote on this title
UsefulNot useful