A database management system (0H8) |s a sollWare pac|age W|lr corpuler progrars
lral corlro| lre creal|or, ra|rlerarce, ard lre use ol a dalaoase. ll a||oWs orgar|zal|ors lo corver|erl|y deve|op dalaoases lor var|ous app||cal|ors oy dalaoase adr|r|slralors (08As) ard olrer spec|a||sls. A dalaoase |s ar |rlegraled co||ecl|or ol dala records, l||es, ard olrer dalaoase oojecls. A 08V3 a||oWs d|llererl user app||cal|or progrars lo corcurrerl|y access lre sare dalaoase. 08V3s ray use a var|ely ol dalaoase rode|s, sucr as lre re|al|ora| rode| or oojecl rode|, lo corver|erl|y descr|oe ard supporl app||cal|ors. ll lyp|ca||y supporls query |arguages, Wr|cr are |r lacl r|gr-|eve| prograrr|rg |arguages, ded|caled dalaoase |arguages lral cors|derao|y s|rp||ly Wr|l|rg dalaoase app||cal|or progrars. 0alaoase |arguages a|so s|rp||ly lre dalaoase orgar|zal|or as We|| as relr|ev|rg ard preserl|rg |rlorral|or lror |l. A 08V3 prov|des lac|||l|es lor corlro|||rg dala access, erlorc|rg dala |rlegr|ly, rarag|rg corcurrercy corlro|, recover|rg lre dalaoase aller la||ures ard reslor|rg |l lror oac|up l||es, as We|| as ra|rla|r|rg dalaoase secur|ly. Fetaures of dbms Support for Iarge amount of data Each DBMS is designed to support large amount of data. They provide special ways and means to store and manipulate large amount of data. Companies are trying to store more and more amount of data. Some of this data will have to be online (available every time). In most of the cases the amount of data that can be stored is not actually constrained by DBSM and instead constrained by the availability of the hardware. For example, Oracle can store terabytes of data. Data sharing, concurrency and Iocking DBSM also allows data to be shared by two or more users. The same data can be accessed by multiple users at the same time - data concurrency. However when same data is being manipulated at the same time by multiple users certain problems arise. To avoid these problems, DBMS locks data that is being manipulated to avoid two users from modifying the same data at the same time. The locking mechanism is transparent and automatic. Neither we have to inform to DBMS about locking nor we need to know how and when DBMS is locking the data. However, as a programmer, if we can know intricacies of locking mechanism used by DBMS, we will be better programmers. Data Security While DBMS allowing data to be shared, it also ensures that data in only accessed by authorized users. DBMS provides features needed to implement security at the enterprise level. By default, the data of a user cannot be accessed by other users unless the owner gives explicit permissions to other users to do so. Data Integrity Maintaining integrity of the data is an import process. If data loses integrity, it becomes unusable and garbage. DBMS provides means to implement rules to maintain integrity of the data. Once we specify which rules are to be implemented, then DBMS can make sure that these rules are implemented always. Three integrity rules (discussed later in this chapter) - domain, entity and referential are always supported by DBMS. FauIt toIerance and recovery DBMS provides great deal of fault tolerance. They continue to run in spite of errors, if possible, allowing users to rectify the mistake in the mean time. DBSM also allows recovery in the event of failure. For instance, if data on the disk is completely lost due to disk failure then also data can be recovered to the point of failure if proper back up of the data is available. Support for Languages DBMS supports a data access and manipulation language. The most widely used data access language for RDBMS (relational database management systems) is SQL. We will discuss more about RDBMS and SQL later in this chapter. DBMS implementation of SQL will be compliant with SQL standards set by ANSI. Apart from supporting a non-procedural language like SQL to access and manipulate data DBMS now a days also provides a procedural language for data processing. Oracle supports PL/SQL and SQL Server provides T-SQL. Entity and Attribute An entity is any object that is stored in the database. Each entity is associated with a collection of attributes. For example, if you take a data of a training institute, student is an entity as we store information about each student in the database. Each student is associated with certain values such as roll number, name, course etc., which are called as attributes of the entity.
Database characteristics : supports multiple, concurrent users one copy of data eliminates redundancy and avoids inconsistency confidentiality, privacy, and security can be promoted standards can be enforced, data quality ensured
hen not to use a DBMS ain costs of using a DB$ include the following. high initial cost (possibly) cost of extra hardware cost of entering data cost of training people to use DB$ cost of maintaining DB$ when a DB$ may be unnecessary if access to data by multiple users is not required if database and application are simple, well-defined, and not expected to change ExternaI, ConceptuaI and PhysicaI LeveI ternal level - an individuaI user's view of the organisation of the data Conceptual level - how the community views the organisation of the data. onIy one data model, conceptual schema. nternal level - The internal view is the view about the actual physical storage of data. storage allocation e.g. B-trees, hashing etc. access paths e.g. specification of primary and secondary keys, indexes and pointers
Data ndependence - The management of data at each level is independent. An objective not a feature. ! - Physical level does not impact the conceptual level. - Conceptual level does not impact the external level.
The DDL specifies how the data is related, e.g., the schema. The schema is a description of the data. n terms of a DB$ architecture the DDL involves the following components. $ystem Catalog - The schema is stored here. DDL compiler $ that translates the DDL into actions. Privileged Commands - Actions that only the DBA can do. DL a language to manipulate the data, just a different kind of query language Advs of DBMS Providing backup and recovery services. Providing multiple interfaces to users. Representing complex relationships among data. Enforcing integrity constraints on the database. Drawing nferences and Actions using rules Data abstraction IeveIs of a database system PhysicaI IeveI: The lowest level of abstraction describes 4 the data is actually stored. The physical level describes complex low- level data structures in detail. LogicaI IeveI: The next higher level of abstraction describes ,9 data are stored in the database, and what relationships exist among those data. The logical level thus describes an entire database in terms of a small number of relatively simple structures. Although implementation of the simple structures at the logical level may involve complex physical level structures, the user of the logical level does not need to be aware of this complexity. Database administrators, who must decide what information to keep in a database, use the logical level of abstraction. 'iew IeveI: The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. any users of a database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database.
Three Tier Architectore TIree LIer cIIenL server urcIILecLure Is uIso known us muILI- LIer urcIILecLure und sIgnuIs LIe InLroducLIon oI u mIddIe LIer Lo medIuLe beLween cIIenLs und servers. TIe mIddIe LIer exIsLs beLween LIe user InLerIuce on LIe cIIenL sIde und duLubuse munugemenL sysLem (DBMS) on LIe server sIde. TIIs LIIrd Iuyer execuLes process munugemenL, wIIcI IncIudes ImpIemenLuLIon oI busIness IogIc und ruIes. TIe LIree LIer modeIs cun uccommoduLe Iundreds oI users. L IIdes LIe compIexILy oI process dIsLrIbuLIon Irom LIe user, wIIIe beIng ubIe Lo compIeLe compIex Lusks LIrougI messuge queuIng, uppIIcuLIon ImpIemenLuLIon, und duLu sLugIng or LIe sLoruge oI duLu beIore beIng upIouded Lo LIe duLu wureIouse. As In Lwo LIered urcIILecLures, LIe Lop IeveI Is LIe user sysLem InLerIuce (cIIenL) und LIe boLLom IeveI Is perIorms duLubuse munugemenL. TIe duLubuse munugemenL IeveI ensures duLu consIsLency by usIng IeuLures IIke duLu IockIng und repIIcuLIon. DuLu IockIng Is uIso reIerred Lo us IIIe or record IockIng. TIIs Is u IIrsL-come, IIrsL-serve DBMS IeuLure used Lo munuge duLu und upduLes In u muILI-user envIronmenL. TIe IIrsL user Lo uccess u IIIe or record denIes uny oLIer user uccess or Iocks IL. L opens up uguIn und becomes uccessIbIe Lo oLIer users once LIe upduLe Is compIeLe. TIe mIddIe LIer Is uIso cuIIed LIe uppIIcuLIon server. L conLuIns u cenLruIIzed processIng IogIc, wIIcI IucIIILuLes munugemenL und udmInIsLruLIon. ocuIIzIng sysLem IuncLIonuIILy In LIe mIddIe LIer mukes IL possIbIe Ior processIng cIunges und upduLes Lo be mude once und be dIsLrIbuLed LIrougIouL LIe neLwork uvuIIubIe Lo boLI cIIenLs und servers. SomeLImes LIe mIddIe LIer Is dIvIded InLo Lwo or more unILs wILI dIIIerenL IuncLIons. TIIs mukes IL u muILI- Iuyer modeI. or exumpIe, In web uppIIcuLIons, LIe cIIenL sIde Is usuuIIy wrILLen In HTM meunwIIIe LIe uppIIcuLIon servers ure usuuIIy wrILLen In C++ or Juvu. By usIng u scrIpLIng Iunguuge embedded In HTM, web servers ucL us LrunsIuLIon Iuyers LIuL uIIow Ior communIcuLIon beLween LIe cIIenL und server Iuyers. TIIs Iuyer receIves requesLs Irom cIIenLs und generuLes HTM responses uILer requesLIng IL Irom duLubuse servers. PopuIur scrIpLIng Iunguuges IncIude JuvuScrIpL, ASP (AcLIve Server Puge), JSP (JuvuScrIpL Puges), PHP (HyperLexL Preprocessor), PerI (PrucLIcuI ExLrucLIon und ReporLIng unguuge), und PyLIon. One oI LIe mujor beneIILs oI LIree LIer urcIILecLure Is LIe ubIIILy Lo purLILIon soILwure und drug und drop moduIes onLo dIIIerenL compuLers In u neLwork.
0ata mode||ng |s a process used lo del|re ard ara|yze dala requ|rererls reeded lo supporl lre ous|ress processes W|lr|r lre scope ol correspord|rg |rlorral|or syslers |r orgar|zal|ors. Trerelore, lre process ol dala rode||rg |rvo|ves proless|ora| dala rode|ers Wor||rg c|ose|y W|lr ous|ress sla|ero|ders, as We|| as polerl|a| users ol lre |rlorral|or sysler. Trere are lrree d|llererl lypes ol dala rode|s produced Wr||e progress|rg lror requ|rererls lo lre aclua| dalaoase lo oe used lor lre |rlorral|or sysler [2|. Tre dala requ|rererls are |r|l|a||y recorded as a corceplua| dala rode| Wr|cr |s esserl|a||y a sel ol lecrro|ogy |rdeperderl spec|l|cal|ors aooul lre dala ard |s used lo d|scuss |r|l|a| requ|rererls W|lr lre ous|ress sla|ero|ders. Tre corceplua| rode| |s lrer lrars|aled |rlo a |og|ca| dala rode|, Wr|cr docurerls slruclures ol lre dala lral car oe |rp|ererled |r dalaoases. To |rp|ererl ore corceplua| dala rode| ray requ|re ru|l|p|e |og|ca| dala rode|s Hierarchical Model The hierarchical data model organizes data in a tree structure. There is a hierarchy oI parent and child data segments. This structure implies that a record can have repeating inIormation, generally in the child data segments. Data in a series oI records, which have a set oI Iield values attached to it. It collects all the instances oI a speciIic record together as a record type. These record types are the equivalent oI tables in the relational model, and with the individual records being the equivalent oI rows. To create links between these record types, the hierarchical model uses Parent Child Relationships. These are a 1:N mapping between record types. This is done by using trees, like set theory used in the relational model, "borrowed" Irom maths. For example, an organization might store inIormation about an employee, such as name, employee number, department, salary. The organization might also store inIormation about an employee's children, such as name and date oI etwork Data is stored along with pointers, which specify the relationship between entities. This was used in Honeywell's Integrated Data Store, IDS. This model is complex. It is difficult to understand both the way data is stored and the way data is manipulated. It is capable of supporting many-to-many relationship between entities, which hierarchical model doesnt. #eIationaI This stores data in the form of a table. Table is a collection of rows and columns. We will discuss more about relational model in the next second.
moduIe Data mining is the principle of sorting through large amounts of data and picking out relevant information. t is usually used by business intelligence organizations, and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. t has been described as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from "data and "the science of extracting useful information from large data sets or databases".
Data ining is simply the task of extracting useful information from recorded data. The captured data needs to be converted into information and knowledge to become useful. Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, from data. Data mining identifies trends within data that go beyond simple analysis.
Weakness of Data Mining Data mining relies on the use of real world data. This data is extremely vulnerable to co-linearity precisely because data from the real world may have unknown interrelations. An unavoidable weakness of data mining is that the critical data that may explain the relationships is never observed. Alternative approaches using an experiment based approach such as Choice odelling for human generated data may be used. nherent correlations are either controlled for or removed altogether through the construction of an experimental design.
&ses of data mining. For example, a database of prescription drugs taken by a group of people could be used to find combinations of drugs exhibiting harmful interactions. $ince any particular combination may occur in only 1 out of 1000 people, a great deal of data would need to be examined to discover such an interaction. A project involving pharmacies could reduce the number of drug reactions and potentially save lives. Unfortunately, there is also a huge potential for abuse of such a database. Essentially, data mining gives information that would not be available otherwise. t must be properly interpreted to be useful. hen the data collected involve individual people, there are many questions concerning privacy, legality, and ethics. Data ining is most frequently used for CR. Common goals are to predict which people are most likely to: a) Be Acquired b) Be Cross- $old or Up-$old c) Leave \ Churn d) Be Retained, $aved, or on back
Superkey: t is an attribute or set of attributes that uniquely identifies rows within a table; in other words, two distinct rows are always guaranteed to have distinct superkeys. {Employee D, Employee Address, $kill} would be a superkey for the "Employees' $kills" table; {Employee D, $kill} would also be a superkey.
Candidate key: t is a minimal superkey, that is, a superkey for which we can say that no proper subset of it is also a superkey. {Employee d, $kill} would be a candidate key for the "Employees' $kills" table.
on-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non- prime attribute in the "Employees' $kills" table.
Primary key: ost DB$s require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a candidate key which the database designer has designated for this purpose.
#eIationaI Database Management System (#DBMS) A DBMS that is based on relational model is called as RDBMS. Relation model is most successful mode of all three models. Designed by E.F. Codd, relational model is based on the theory of sets and relations of mathematics. Relational model represents data in the form a table. A table is a two dimensional array containing rows and columns. Each row contains data related to an entity such as a student. Each column contains the data related to a single attribute of the entity such as student name. One of the reasons behind the success of relational model is its simplicity. It is easy to understand the data and easy to manipulate. Another important advantage with relational model, compared with remaining two models is, it doesnt bind data with relationship between data item. Instead it allows you to have dynamic relationship between entities using the values of the columns. Almost all Database systems that are sold in the market, now- a-days, have either complete or partial implementation of relational model. Figure 1 shows how data is represented in relational model and what are the terms used to refer to various components of a table. The following are the terms used in relational model. . %upIe / #ow A s|rg|e roW |r lre lao|e |s ca||ed as lup|e. Eacr roW represerls lre dala ol a s|rg|e erl|ly. Attribute Column A co|urr slores ar allr|oule ol lre erl|ly. For exarp|e, |l dela||s ol sluderls are slored lrer sluderl rare |s ar allr|oule; course |s arolrer allr|oule ard so or. CoIumn ame Each column in the table is given a name. This name is used to refer to value in the column. %abIe ame Each table is given a name. This is used to refer to the table. The name depicts the content of the table. The following are two other terms, primary key and foreign key, that are very important in relational model. Primary Key A table contains the data related entities. If you take STUDETNS table, it contains data related to students. For each student there will be one row in the table. Each students data in the table must be uniquely identified. In order to identify each entity uniquely in the table, we use a column in the table. That column, which is used to uniquely identify entities (students) in the table is called as primary key. In case of STUDENTS table (see figure 1) we can use ROLLNO as the primary key as it in not duplicated. So a primary key can be defined as a set of columns used to uniquely identify rows of a table. Some other examples for primary keys are account number in bank, product code of products, employee number of an employee. Composite Primary Key In some tables a single column cannot be used to uniquely identify entities (rows). In that case we have to use two or more columns to uniquely identify rows of the table. When a primary key contains two or more columns it is called as composite primary key. Foreign Key In relational model, we often store data in different tables and put them together to get complete information. For example, in PAYMENTS table we have only ROLLNO of the student. To get remaining information about the student we have to use STUDETNS table. Roll number in PAYMENTS table can be used to obtain remaining information about the student. Integrity #uIes Data integrity is to be maintained at any cost. If data loses integrity it becomes garbage. So every effort is to be made to ensure data integrity is maintained. The following are the main integrity rules that are to be followed. Domain integrity Data is said to contain domain integrity when the value of a column is derived from the domain. Domain is the collection of potential values. For example, column date of joining must be a valid date. All valid dates form one domain. If the value of date of joining is an invalid date, then it is said to violate domain integrity. Entity integrity This specifies that all values in primary key must be not null and unique. Each entity that is stored in the table must be uniquely identified. Every table must contain a primary key and primary key must be not null and unique. #eferentiaI Integrity This specifies that a foreign key must be either null or must have a value that is derived from corresponding parent key. For example, if we have a table called BATCHES, then ROLLNO column of the table will be referencing ROLLNO column of STUDENTS table. All the values of ROLLNO column of BATCHES table must be derived from ROLLNO column of STUDENTS table. This is because of the fact that no student who is not part of STUDENTS table can join a batch
3 module ormaIization is a technique for designing relational database tables to minimize duplication of information and, in so doing, to safeguard the database against certain types of logical or structural problems, namely data anomalies. For example, when multiple instances of a given piece of information occur in a table, the possibility exists that these instances will not be kept consistent when the data within the table is updated, leading to a loss of data integrity. A table that is sufficiently normalized is less vulnerable to problems of this kind, because its structure reflects the basic assumptions for when multiple instances of the same information should be represented by a single instance only. Normalization typically involves decomposing an unnormalized table into two or more tables that, were they to be combined (joined), would convey exactly the same information as the original table.
ProbIems addressed by normaIization An update anomaIy. Employee 519 is shown as having different addresses on different records. An insertion anomaIy. Until the new faculty member is assigned to teach at least one course, his details cannot be recorded. A deIetion anomaIy. All information about Dr. Giddens is lost when he temporarily ceases to be assigned to any courses. A table that is not sufficiently normalized can suffer from logical inconsistencies of various types, and from anomalies involving data operations. n such a table:
A tabIe is in 1first normaI formF)if and onIy if it faithfully represents a relation Given that database tables embody a relation- like form, the defining characteristic of one in first normal form is that it does not allow duplicate rows or nulls. $imply put, a table with a unique key (which, by definition, prevents duplicate rows) and without any nullable columns is in 1NF.
%he criteria for second normaI form (F) are: The table must be in 1NF. None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a candidate key; in other words, all functional dependencies of non-prime attributes on candidate keys are full functional dependencies. For example, in an "Employees' $kills" table whose attributes are Employee D, Employee Address, and $kill, the combination of Employee D and $kill uniquely identifies records within the table. Given that Employee Address depends on only one of those attributes namely, Employee D the table is not in 2NF. Note that if none of a 1NF table's candidate keys are composite i.e. every candidate key consists of just one attribute then we can say immediately that the table is in 2NF.
%he criteria for third normaI form (3F) are: The table must be in 2NF. Every non-prime attribute of the table must be non-transitively dependent on each candidate key. [7] A violation of 3NF would mean that at least one non-prime attribute is only 3/70.9 dependent (transitively dependent) on a candidate key. For example, consider a "Departments" table whose attributes are Department D, Department Name, anager D, and anager Hire Date; and suppose that each manager can manage one or more departments. {Department D} is a candidate key. Although anager Hire Date is functionally dependent on the candidate key {Department D}, this is only because anager Hire Date depends on anager D, which in turn depends on Department D. This transitive dependency means the table is not in 3NF.
[VaW Database recovery Numerous occurrences could lead to data loss: data or database elements could be accidentally deleted; data could become corrupted by the addition of bad data; hardware, such as a disk or server, could fail; or disasters, such as flooding, could destroy your server and storage media. Since much time, effort, and money are usually invested in an organization's data, it is unlikely that the loss of it would be a trivial thing. For this reason, it is critical that you have a tested recovery plan in place for your geodatabase. Notice the plan should be tested before it is implemented-you can back up all the data you want, but if you can't recover it, it is useless. Backup and recovery strategy needs vary in accordance with your specific situation. For ArcSDE geodatabases for SQL Server Express, only simple backup and recovery are performed. A simple backup is a full backup. Since ArcSDE geodatabases for SQL Server Express are comparatively small and accessed by fewer users than ArcSDE geodatabases on the other supported database management systems (DBMS), it doesn't take as long to create full backup files, and they can be done more frequently. To learn more about this type of recovery model, see Simple backup and recovery.