Introduction to Databases

Name : Akanksha Sharma

Database Concepts
Data ?
Data refers to a collection of natural phenomena descriptors, including the results of experience, observation or experiment, or a set of premises. This may consist of numbers, words, or images, particularly as measurements or observations of a set of variables.

Information ?
Information is a quality of a message from a sender to one or more receivers. Information is always about something like size of a parameter, occurrence of an event, etc. Information does not have to be accurate. It may be a truth or a lie, or just the sound of a falling tree.

Knowledge?
Knowledge is used to mean the confident understanding of a subject with the ability to use it for a specific purpose if appropriate .

DBMS
DBMS is a set of software programs that control the organization, storage, management, and retrieval of data in a database. DBMS includes:  A modeling language to define the schema of each database hosted in the DBMS, according to the DBMS data model.  Data structures (fields, records, files and objects) optimized to deal with very large amounts of data stored on a permanent data storage device.  A database query language and report writer to allow users to interactively interrogate the database, analyze its data and update it according to the users privileges on data.  A transaction mechanism, that would guarantee the ACID properties, in order to ensure data integrity, despite concurrent user accesses (concurrency control), and faults (fault tolerance).

Typical Database Applications
    

Traditional (Employee, student, product database) Online Shopping Search Engines Data Warehousing (OLAP) Data Mining Geographical Information Systems

Data-Level Models

Flat File Structure A database with a single table is called a flat file structure. A flat-file structure is good only for extremely simple databases and not practical for most business applications. Many spreadsheets include some database features like sorting entries and counting or summarizing entries that meet certain criteria. Hierarchical Data Model The hierarchical data model is set up like a "forest" or collection of tree structures. The hierarchical data model is a special case of the network data model. This data model is very efficient for certain kinds of applications where the data to be modeled is also like a tree. The best-known hierarchical database management system is IBM's IMS. Network Data Model The network data model is similar to the entity-relationship model with all relationships restricted to be binary, many-one relationships. This restriction allows a simple directed graph model to be used. The network data model is fast, but it is difficult to conceptualize complex data structures using this model. An example of a network database management system is IDMS. Relational Data Model Relational model is based on predicate logic and set theory. You have sets of statements of fact, and the underlying system can determine new sets of facts .The real power comes from your complete control over determining new facts. All relationships between facts are explicit in the database, and the command language can use and manipulate them. The mathematics behind the model make this manipulation feasible.

483 3. the key fields have non-null values -. each row has a different value for the key attribute(s). That is. because of their resemblance to an unstructured sequence of records. Vendor Global Revenue Oracle  7. Each tuple in a relation must be unique -. Attribute: Table column Other commonly used terms for attribute are 'property' and 'field.052 IBM Microsoft  Tuple: Table row A tuple is an instance of an entity or relationship or whatever is represented by the relation. Relations are sometimes referred to as flat files.that is. Key: A single attribute or combination of attributes whose values uniquely identify the tuples of the relation. each of which contains values for a fixed number of attributes. there can be no duplicates.RDBMS  Relation: Two dimensional table The relation itself corresponds to our familiar notion of a table: A relation is a collection of tuples.' The set of permissible values for each attribute is called the domain for that attribute.312 3. Sybase NCR Teradata 524 457  . The relational model requires that every relation have a key and that for any tuple in the relation.no two tuples may have the same key value and every tuple must have a value for the key attribute.

Case Study : Oracle .

delete from tablename where conditions. . . columnname.. Each record is made up of a number of fields. Each field occupies one column and each record occupies one row . somevalue. .  Selecting Data . . from tablename.create table tablename (columnname type. When a foreign key exists in a table.select "First Name"||' '||'Last Name' 'Full Name' from employee where deptid=1 and salary>5000. In Oracle. Every table in Oracle has a field or a combination of fields that uniquely identifies each record in the table. A table consists of a number of records . columnname. Different tables are created for the various groups of information. No two fields in a record can have the same field name.. . ..)..).alter table employee modify (Phone number).  Inserting Data . When a field in one table matches the primary key of another table.Case Study : Oracle  Oracle Database Fundamentals Oracle stores each data item in its own field .describe department.) values (somevalue.update tablename set columnname=somevalue where conditions.. Oracle stores records relating to each other in a table... columnname type . complete unit of data. called a record . .  Updating Data . thing. .alter table tablename drop column columnname.alter table employee add ("Joining Date" date).drop table tablename. .  Creating Database Tables .. the foreign key's table is sometimes referred to as a lookup table . . the fields relating to a particular person.insert into tablename (columnname.select columnname. the field is referred to as a foreign key. or event are bundled together to form a single.

Case Study : Microsoft .

RDBMS Concepts .

Using Oracle PL/SQL     Basic Structure of PL/SQL Variables and Types Simple PL/SQL Programs Control Flow in PL/SQL .

*/ BEGIN /* Executable section: procedural and SQL statements go here. The other sections are optional. types. and local subprograms. The only SQL statements allowed in a PL/SQL program are SELECT. All PL/SQL programs are made up of blocks. INSERT. DELETE and several other data manipulation statements plus some transaction control. . */ END. */ EXCEPTION /* Exception handling section: error handling statements go here. UPDATE. Only the executable section is required. which can be nested within each other and each block performs a logical action in he program. It extends SQL by adding constructs found in procedural languages. */ /* This is the only section of the block that is required.Basic Structure PL/SQL stands for Procedural Language/SQL. resulting in a structural language that is more powerful than SQL. The basic unit in PL/SQL is a block. A block has the following structure: DECLARE /* Declarative section: variables.

text VARCHAR2(12) := 'Hello world'. DATE. END. If not initialised specifically they default to NULL. text DATE := SYSDATE.   Symbol := is the assignment operator to store a value in a variable.Variables  The DECLARE section defines and (optionally) initialises variables. TEXT etc. number2 NUMBER(2) := 17. INTEGER. CHAR. The major datatypes in PL/SQL include NUMBER. . -. DECLARE number1 NUMBER(2). VARCHAR2.current date and time BEGIN SELECT street_number INTO number1 FROM address WHERE name = 'Smith'. TIMESTAMP.

INSERT INTO T1 VALUES(b. b NUMBER. run.b FROM T1 WHERE e>1. END. . 3). */ DECLARE a NUMBER. INSERT INTO T1 VALUES(2. .a). below is the PL/SQL program.f INTO a. BEGIN SELECT e. /* Above is plain SQL. DELETE FROM T1. 4). INSERT INTO T1 VALUES(1. f INTEGER ).Simple Program in PL/SQL CREATE TABLE T1( e INTEGER.

.. The ELSE part is optional.. ... ELSIF <condition_n> THEN .. ELSIF <condition_2> THEN .. ...Control Flow in PL/SQL An IF statement looks like: IF <condition> THEN <statement_list> ELSE <statement_list> END IF... END IF. use: IF <condition_1> THEN . If you want a multiway branch. ELSE . ..

.Control Flow in PL/SQL Loops are created with the following: LOOP <loop_body> /* A list of statements. At least one of the statements in <loop_body> should be an EXIT statement of the form EXIT WHEN <condition>. */ END LOOP. The loop breaks if <condition> is true.

Examples LOOPING : DECLARE a NUMBER. . . b NUMBER. run.a+10). END IF. END. CONDITIONAL : DECLARE i NUMBER := 1. BEGIN LOOP INSERT INTO T1 VALUES(i. IF b=1 THEN INSERT INTO T1 VALUES(b. . ELSE INSERT INTO T1 VALUES(b+10.b FROM T1 WHERE e>1. run. BEGIN SELECT e.f INTO a. EXIT WHEN i>100. END.a). END LOOP. i := i+1.i).

INNER JOIN (sometimes called the "EQUI-JOIN") where tables are combined based on a common column.Joins     CROSS JOIN (Cartesian product) is the simplest join. OUTER JOIN which involves combining all rows of one table with only matching rows from the other table. . SELF JOIN which is a table joined to itself.

Cross Join A cross join returns the cartesian product of the sets of records from the two joined tables. If A and B are two sets. Examples : Explicit – SELECT * FROM employee CROSS JOIN department Implicit – SELECT * FROM employee. . department. then cross join = A × B.

Example – SELECT * FROM employee INNER JOIN department ON employee. that uses only equality comparisons in the join-predicate.DepartmentID = department.Inner Joins An equi-join. also known as an equijoin. or theta join. Using other comparison operators (such as <) disqualifies a join as an equi-join.DepartmentID  Natural join  . is a specific type of comparator-based join.

DepartmentID .Outer Joins  Left outer join The result of a left outer join for tables A and B always contains all records of the "left" table (A). This means that a left outer join returns all the values from the left table. Example – SELECT * FROM employee LEFT OUTER JOIN department ON employee.DepartmentID = department. plus matched values from the right table (or NULL in case of no matching join predicate). even if the join-condition does not find any matching record in the "right" table (B).

NULL will appear in columns from A for those records that have no match in A. Example – SELECT * FROM employee RIGHT OUTER JOIN department ON employee.DepartmentID = department. If no matching row from the "left" table (A) exists. A right outer join returns all the values from the right table and matched values from the left table (NULL in case of no matching join predicate).Outer Joins  Right outer join Every record from the "right" table (B) will appear in the joined table at least once.DepartmentID .

This is accomplished by using table name aliases to give each "instance" of the table a separate name.EmployeeID 65 Grover Rivers 63 .EmployeeName AS Manager FROM Employees AS E1 INNER JOIN Employees AS E2 ON E1. E2. Employees EmployeeID 61 62 63 64 EmployeeName Sue Smith David Jones Troy Parker Claire Smith-Jones ManagerID (null) 61 61 63 Example – SELECT E1.Self Join  A self-join is simply a normal SQL join that joins one table to itself.ManagerID = E2.EmployeeName AS Employee.

A many-to-many relationship signifies that many instances of a given entity relate to many instances of another entity.Normalization Introduction Entity: The word ‘entity’ is the general name for the information that is to be stored within a single table. one-to-many (1:M) .A one-to-many relationship signifies that each instance of a given entity relates to one or more instances of another entity. When multiple attributes are used to derive a primary key. . The foreign key can be found within the M table. this key is known as a concatenated primary key. Foreign key: A foreign key forms the basis of a 1:M relationship between two tables. Primary key: A primary key uniquely identifies a row of data found within a table.A one-to-one relationship signifies that each instance of a given entity relates to exactly one instance of another entity. Relationship: one-to-one (1:1) . and maps to the primary key found in the 1 table. Information about the entities is known as attributes. many-to-many (M:N) .

Remove the repeating group of attributes to form a new entity .The Three Normal Forms  First Normal Form A table is in first normal form (1NF) if there are no repeating groups. How to Normalize ? .Add to it the original key  .

Remove the partial key and its dependents to form a new table  .The Three Normal Forms  Second Normal Form A table is in Second Normal Form(2NF) if it is in 1NF and each non-key field is functionally dependent on the entire primary key. How to Normalize ? .For each non-key attribute. determine if its key is the first part. or if neither then the answer is both parts .Examine tables with a composite key (a key made up of two parts) . or the second part.

How to Normalize ? .  .This becomes the Foreign Key link in the original table (shown with a *).The Three Normal Forms  Third Normal Form A table is in Third Normal Form(3NF) if it is in 2NF and there are no transitive dependencies.Remove them to form a new table .Promote one of the attributes to be the key of the new table .Identify any dependencies between non-key attributes within each table .

StartDate ) Project: ( ProjectNumber. EmployeeNumber. and the table has a primary key then it is First normal form (1NF) 2. ProjectNumber. Does the table contain any transitive dependencies or derived attributes? If not. StartDate ) Employee: ( EmployeeNumber. Does the table contain any partial dependencies? If not. and it is in 1NF then it is in 2NF 3. SupervisorName ) EmployeeDepartment: ( DepartmentName. EmployeeName ) EmployeeProject: ( EmployeeNumber. Does the table contain any repeating groups? If not.Example Department: ( DepartmentName. and it is in 2NF then it is in 3NF . ProjectName ) To check whether these tables are in NF you must answer the following questions 1. SupervisorNumber ) – SupervisorNumber is a foreign key Supervisor: ( SupervisorNumber.

Indexing   Types of Single-level Ordered Indexes  Primary Indexes  Clustering Indexes  Secondary Indexes Multilevel Indexes .

the index entry has the key field value for the first record in the block. which is called the block anchor A similar scheme can use the last record in a block.   .Types of Single-Level Indexes  Primary Index    Defined on an ordered data file The data file is ordered on a key field Includes one index entry for each block in the data file. since it includes an entry for each disk block of the data file and the keys of its anchor record rather than for every search value. A primary index is a nondense (sparse) index.

Primary index on the ordering key field .

the index entry points to the first data block that contains records with that field value. which requires that the ordering field of the data file have a distinct value for each record. Includes one index entry for each distinct value of the field.Types of Single-Level Indexes  Clustering Index   Defined on an ordered data file The data file is ordered on a non-key field unlike primary index. It is another example of nondense index where Insertion and Deletion is relatively straightforward with a clustering index.   .

A clustering index on the DEPTNUMBER ordering nonkey field of an EMPLOYEE file. .

.Clustering index with a separate block cluster for each group of records that share the same value for the clustering field.

  The first field is of the same data type as some nonordering field of the data file that is an indexing field. it is a dense index . or a nonkey with duplicate values. The second field is either a block pointer or a record pointer. indexing fields) for the same file.   The index is an ordered file with two fields. The secondary index may be on a field which is a candidate key and has a unique value in every record. hence.  Includes one entry for each record in the data file. There can be many secondary indexes (and hence.Types of Single-Level Indexes  Secondary Index  A secondary index provides a secondary means of accessing a file for which some primary access already exists.

.A dense secondary index (with block pointers) on a nonordering key field of a file.

A secondary index (with recored pointers) on a nonkey field implemented using one level of indirection so that index entries are of fixed length and have unique field values. .

top level until all entries of the top level fit in one disk block   A multi-level index can be created for any type of first-level index (primary.. fourth. secondary. clustering) as long as the firstlevel index consists of more than one disk block .. the original index file is called the first-level index and the index to the index is called the second-level index. We can repeat the process. . creating a third. we can create a primary index to the index itself . in this case.Multi-Level Indexes  Because a single-level index is an ordered file..

.A two-level primary index resembling ISAM (Indexed Sequential Access Method) organization.

Data Warehousing & Business Intelligence .

integrated. Business intelligence (BI) – BI systems provide managers with -Actionable information and knowledge -At the right time -At the right location -In the right form The knowledge derived from analyzing an organization’s information Technologies for gathering. storing. time-variant collection of data in support of management's decisions.Introduction A Data Warehouse (DW) is a subject-oriented. analyzing and providing access to data to help enterprise users make better business decisions . nonvolatile.

Characteristics of a DW Operational Data Warehouse Leads Inventory Customers Products Quotes Orders Regions Time Focus is on Subject Areas rather than Applications .

and consistency and recoverability are critical. (access. etc. update) • Database is current.are short. Records are accessed one at a time. • OLTP Operations: .are structured and repetitive . • OLTP applications normally automate clerical data processing tasks of an organization. transaction handling.require detailed and up-to-date data .On-Line Transaction Processing (OLTP) • Database management systems are typically used for on-line transaction processing. atomic and isolated transactions . like data entry and enquiry. read.

Pivot: Re-orient the multi-dimensional view . • Data warehouse consolidation of operational databases. OLAP operations view the data flexibly from different perspectives (different levels of abstractions).Drill-down: Decrease the level of abstraction .On-Line Analytical Processing (OLAP) • On-line analytical processing is essential for decision support. • Owing to the hierarchical nature of the dimensions.Slice and dice: Selection and projection . • OLAP is supported by data warehouses. • OLAP operations: .Drill-through: Links to the raw data .roll-up: Increase the level of abstraction .

Benefits • Increase customer profitability • Cost effective decision making • Manage customer and business partner relationships • Manage risk. access.DW . operations and manufacturing • Reduction in time to locate. assets and liabilities • Integrate inventory. and analyze information (Link multiple locations and geographies) • Identify developing trends and reduce time to market • Strategic advantage over competitors .

Warehouse Architecture EIS /DSS Metadata Select Extract Transform Integrate Maintain Query Tools Data Warehouse OLAP/ROLAP Operational Systems/Data Web Browsers Middleware/API Data Preparation Enterprise Data Warehouse .

..Data Mining MDDB Architected Datamarts Warehouse Databases Warehouse Admin Tool Data Warehouse Is Not Just About Data. But Tools Too .DW Architecture Components Data Cleansing Tools Data Modeling Tool Central Metadata ROLAP Engine Data Access and Analysis Tools -Managed Query Central Warehouse (RDBMS) RDBMS -Desktop OLAP -ROLAP -MOLAP Local meta data Source Databases ETL Tool .

DW/BI Tools  ETL Tools Extract. the data warehouse. resulting in e. different reference data. Typically the known ETL tools are intended to use in batch mode.g. and Load (ETL) is a process in data warehousing that involves 1. DW are typically fed asynchronously by a variety of sources which all serve a different purpose. Loading it into the end target. . Transforming it to fit business needs (which can include quality levels). ETL is a key process to bring heterogeneous and asynchronous source extracts to a homogeneous environment. and ultimately 3. Extracting data from outside sources. pulling large volumes from different platforms and systems at schedule times and transforming and integrating the data until it fits the format to be loaded into a (corporate) multi-dimensional data warehouse. Transform. i. 2.e.

a software tool you use to query information in a data warehouse .a subset of the data warehouse in which only a focused portion of the data warehouse information is kept Other technical components of business intelligence include tools such as  Data mining  Automatic exception detection with proactive alerting and automatic recipient determination  Automatic learning Data-mining tool .DW/BI Tools    Data mart .

manage and distribute universes for a particular group of BO and WebIntelligence users. UNIVERSE which isolates end users from the technical issues of the database structure  To create.DW/BI Tools BO-Designer –  It lets you to create the semantic layer i.e. Sales Revenue) . The Building Blocks of a Designer are  Classes: Logical grouping of objects  Objects: Most refined component of the Universe. An Object maps to Data or a derivation of data in the database  Dimension: Parameters for the analysis (Ex. City)  Detail: Description of a dimension (Ex. Phone #)  Measure: Numeric information by which Dimension object can be measured (Ex.

ODS Development Case Study End-to-End Process Diagram Oracle Apps Seibel Excel Files Flat Files Teradata Intermediate Tables ODS EDW Source Systems ETL Process Target DW .

Thanks .