You are on page 1of 16

Data Source

Datasource is a name given to the connection set up to a database from a server. The name is commonly used when creating a query to the database. The DSN (Datasource Name) does not have to be the same as the filename for the database. For example, a database file named friends.mdb could be set up with a DSN of school. Then DSN school would then be used to refer to the database when performing a query. A DataSource object is the representation of a data source in the Java programming language. In basic terms, a data source is a facility for storing data. It can be as sophisticated as a complex database for a large corporation or as simple as a file with rows and columns. A data source can reside on a remote server, or it can be on a local desktop machine. Applications access a data source using a connection, and a DataSource object can be thought of as a factory for connections to the particular data source that the DataSource instance represents. The DataSource interface provides two methods for establishing a connection with a data source.

Source of Data
From another Database From Web / Network / User / Groups

Data File Environment

Q8/I (A) 2006 Q5 (A) Data File Environment, also called file system (often also written as filesystem) is a method of storing and organizing computer files and their data. Each file in this system is isolated and possesses no / very little connection with another. In a data file environment, all files are produced using various tools and applications, so file integrity is far less. Files can be stored on different hard disk partitions according to user requirements and more files can be added to the same partition till the disk is full.

Security is generally low in a Data File Environment and sharing integrity is also low.

Database Environment

Q8/II (A) 2006 Q5 (A) In a database environment, data is logically stored in tabular form and often possess relations and connections within such other tables. In database environment, all files (databases) are created can be opened / edited / deleted using same tool (DBMS Software), so file integrity is very high. Databases are broken down into smaller Data Files which is stored in memory at random locations on related server. Such Data Files are logically connected but physically scattered on servers storage device. Different usability and accessibility rights awarded to different level of users which ensures that the database environment remains very secure. Again, it is highly sharable since the core language of all database software are same (SQL)

Data model (Database Models)


A data model in software engineering is an abstract model that describes how data are represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest. According to Hoberman (2009), "A data model is a way finding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment." A data model explicitly determines the structure of data or structured data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually data models are specified in a data modeling language. A database model is a theory or specification describing how a database is structured and used. Several such models have been suggested. Common models include:

Flat model: This may not strictly qualify as a data model. The flat (or table) model consists of a single, twodimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another.

Hierarchical model: In this model data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.

Network model: This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and sets define one-to-many relationships between records: one owner, many members.

Relational model: is a database model based on firstorder predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values.

Object-relational model: Similar to a relational database model, but objects, classes and inheritance are directly supported in database schemas and in the query language.

Concept Oriented Model: This is the conceptual structuring of a database. Real structure may vary from this structuring as this widely depend upon system or database designer and may conceive a problem in different way than that is actually implemented.

Star schema is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema.

Properties of Databases (ACID)


Atomicity Atomicity requires that database modifications must follow an all-or-nothing rule. Each transaction is said to be atomic if one part of the transaction fails, the entire

transaction fails and database state is left unchanged. It is critical that the database management system maintains the atomic nature of transactions in spite of any application, DBMS, operating system or hardware failure. An atomic transaction cannot be subdivided, and must be processed in its entirety or not at all. Atomicity means that users do not have to worry about the effect of incomplete transactions. Transactions can fail for several kinds of reasons:
Hardware failure: A disk drive fails, preventing some of the transaction's database changes from taking effect System failure: The user loses their connection to the application before providing all necessary information Database failure: E.g., the database runs out of room to hold additional data Application failure: The application attempts to post data that violates a rule that the database itself enforces, such as attempting to create a new account without supplying an account number

Consistency The consistency property ensures that the database remains in a consistent state. More precisely, it says that any transaction will take the database from one consistent state to another consistent state. The consistency rule applies only to integrity rules that are within its scope. Thus, if a DBMS allows fields of a record to act as references to another record, then consistency implies the DBMS must enforce referential integrity: by the time any transaction ends, each and every reference in the database must be valid. If a transaction consisted of an attempt to delete a record referenced by another, each of the following mechanisms would maintain consistency:
Abort the transaction, rolling back to the consistent, prior state Delete all records that reference the deleted record (this is known as cascade delete) Nullify the relevant fields in all records that point to the deleted record.

Isolation Isolation refers to the requirement that other operations cannot access or see data that has been modified during a transaction that has not yet completed. Each transaction must remain unaware of other concurrently executing transactions, except that one transaction may be forced to wait for the completion of another transaction that has modified data that the waiting transaction requires. Durability Durability is the DBMS's guarantee that once the user has been notified of a transaction's success, the transaction will not be lost. The transaction's data changes will survive system failure, and that all integrity constraints have been

satisfied, so the DBMS won't need to reverse the transaction. Many DBMSs implement durability by writing transactions into a transaction log that can be reprocessed to recreate the system state right before any later failure. A transaction is deemed committed only after it is entered in the log.

Deeper into Database modeling language


Hierarchical model
o A hierarchy can link entities either directly or indirectly, and either vertically or horizontally. The only direct links in a hierarchy, in so far as they are hierarchical, are to one's immediate superior or to one of one's subordinates, although a system that is largely hierarchical can also incorporate alternative hierarchies. Indirect hierarchical links can extend "vertically" upwards or downwards via multiple links in the same direction, following a path. Degree of branching Degree of branching refers to the number of direct subordinates or children an object has (equivalent to the number of vertices a node has). Hierarchies can be categorized based on the "maximum degree", the highest degree present in the system as a whole. Categorization in this way yields two broad classes: linear and branching. In a linear hierarchy, the maximum degree is 1. In other words, all of the objects can be visualized in a lineup, and each object (excluding the top and bottom ones) has exactly one direct subordinate and one direct superior. Note that this is referring to the objects and not the levels; every hierarchy has this property with respect to levels, but normally each level can have an infinite number of objects. An example of a linear hierarchy is the hierarchy of life. In a branching hierarchy, one or more objects have a degree of 2 or more (and therefore the maximum degree is 2 or higher). For many people, the word "hierarchy" automatically evokes an image of a branching hierarchy. Branching hierarchies are present within numerous systems, including organizations and classification schemes. The broad category of branching hierarchies can be further subdivided based on the degree. A flat hierarchy is a branching hierarchy in which the maximum degree approaches infinity, i.e., with a wide span. Most often, systems intuitively regarded as hierarchical have at most a moderate span. Therefore, a flat hierarchy is often not viewed as a hierarchy at all at first blush. For example, diamonds and graphite is a flat hierarchy of numerous carbon atoms which can be further decomposed into subatomic particles.

Q2 (C) 2007 Q4 (B)

An overlapping hierarchy is a branching hierarchy in which at least one objects has two parent objects. For example, a graduate student can have two co-supervisors to whom they report directly and equally, and who have the same level of authority within the university hierarchy (i.e., they have the same position or tenure status).

Network model
o The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.

o Object model o A collection of objects or classes through which a program can examine and manipulate some specific parts of its world. In other words, the object-oriented interface to some service or system. Such an interface is said to be the object model of the represented service or system.

Relational model o Its central idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values. The content of the database at any given time is a finite (logical) model of the database, i.e. a set of relations, one per predicate variable, such that all predicates are satisfied. A request for information from the database (a database query) is also a predicate. The purpose of the relational model is to provide a declarative method for specifying data and queries: we directly state what information the database contains and what information we want from it, and let the database management system software take care of describing data structures for storing the data and retrieval procedures for getting queries answered.

Inverted lists and other methods are also used. A given database management system may provide one or more of the four models. The optimal structure depends on the natural organization of the application's data, and on the application's requirements (which include transaction rate (speed), reliability, maintainability, scalability, and cost).

The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of purists who believe this model is a corruption of the relational model, since it violates several of its fundamental principles for the sake of practicality and performance. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.

DBMS Concepts
Relations are the total table in which data are inserted and maintained. One or more such tables may be linked using different types of keys to form a database. Such a link helps in relational integrity (all related areas are updated when a common field is updated) and data sufficiency (low redundancy and multiplicative errors). A relation is again logically divided into rows and columns. The columns represent different attributes of the table, one of which is generally a primary key (used to decrease redundancy). The rows, frequently referred to as tuples in database terminology, are complete information on a single item which is indexed (linked / for which the table is actually made) in the relation.

Keys in DBMS
Primary key: The attribute or combination of attributes that uniquely identifies a row or record. Foreign Key: an attribute or combination of attributes in a table whose value matches a Primary key in another table. Composite key: A primary key that consists of two or more attributes is known as composite key Candidate key: is a column in a table which has the ability to become a primary key. Alternate Key: Any of the candidate keys that are not part of the primary key is called an alternate key. An alternate key is any candidate key which is not selected to be the
primary key.

Super key - A super key is defined in the relational model as a set of attributes of a relation variable for which it holds that in all relations assigned to that variable there are no two distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a super key can also be defined as a set of attributes of a variable upon which all attributes of the relation are functionally dependent. Secondary key: alternate of primary key.

DBMS Terminologies
Database management system (DBMS): Software for establishing, Q2 (A) 2007 updating, and querying (e.g., managing) a database Database: Organizing files into related units which are then viewed as a single storage. The data in the database are generally made available to a wide range of users through sharing and mentioning different rights and roles to different classes of users. SQL (Structural Query Language): This is the core language of all databases and this is also the common platform for different database engines to interact.

Q5 (B) Data warehouse: This is a physical repository where relational data are 2007 organized to provide clean, enterprise-wide data in a standardized format. Data warehouse is a huge database that stores current and historical data of potential interest to decision makers throughout the company. These data originates in different TPS and through other external entry methods. Data Marts: These are the subsets of a data warehouse in which a summarized and highly focused portion of the organizations data is placed in a separate database for a specific set of users. Companies often build enterprise-wide warehouses where a central data warehouse serves the entire organization; or they create small decentralized warehouses called data marts. Entity: An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified. Entities carries attributes to get it uniquely identified. Relationship: Two different entities possessing some logical associations are physically connected using relationships. Relationships may also have attributes attached to it. Attributes: These are the features or uniquely identifiable characteristic of an element (entity or Relationship).

Relevance of relational design in DSS

Multidimensional problem solving: in DSS architecture, problem solving requires multiple ways of evaluation of the problem and collecting requisite Q1 (A) information towards each different evaluation. 2005 Q2 (A) 2006

Q2 (B) 2007 Q5 (A)

Critical queries: DBMS and RDBMS can handle complex queries and information search which is very useful in DSS. Referentially integrated inputs: RDBMS and Relational structuring of data helps in connecting related fields and information of a single item or object. Data warehousing support: RDBMS can remotely connect to different servers to fetch data from and span across boundaries to create a centralized data access medium which eventually gives rise to data warehouses. Data mart support: RBDMS, through its access rights and different views to the same data can create data marts for high involvement decision making Sharability and scalability of information: Since a database accepts concurrent access, multiple users can log on to the same screen at different geographical locations or at different decision points. Information stored in the database is highly scalable to offer flexibility at the information searchers end.

Q8/I (B) 2006 Q8/II (A) Normalization is the scientific method of breaking down complex table structures into simple table structures using certain rules. This method is used to reduce redundancy in table and eliminate the problems of inconsistency and disk space usage. The normalization theory is based on the fundamental notion of functional dependency. (Given a Relation / Table R, Attribute A is functionally dependent on attribute B if each value of A in R is associated with precisely one value of B.

Database Normalization

E.g., >>

Code E1 E2 E3

Name Mac Sandra Henry

City Delhi CA Paris

Not Normalized Form The relation is kept without any normalization rules and guidelines. E.g., >> ECODE E101 DEPT Systems DEPTHEA D E901 PROJCODE P27 P51 P20 P27 HOURS 90 101 60 109

E305

Sales

E906

E508

Admin

E908

P22 P51 P27

98 NULL 72

First Normal Form (1NF) A table is said to be in 1NF if each cell of the table contains precisely one value. E.g., >> ECODE E101 E101 E101 E305 E305 E508 E508 DEPT Systems Systems Systems Sales Sales Admin Admin DEPTHEA D E901 E901 E901 E906 E906 E908 E908 PROJCODE P27 P51 P20 P27 P22 P51 P27 HOURS 90 101 60 109 98 NULL 72

Second Normal Form (2NF) A table is said to be in 2NF when it is in 1NF and every attribute in the row is functionally dependent on the whole key, and is not just a part of the key. Guidelines to convert a table to 2NF: Find and remove attributes that are functionally dependent on only a part of the key and not on the whole key. Place them in a different table. Group the remaining attributes. E.g., >> ECODE E101 E305 E508 DEPT Systems Sales Admin DEPTHEA ECODE PROJCOD D E E901 E101 P27 E906 E101 P51 E908 E101 P20 E305 E305 E508 E508 Third Normal Form (3NF) P27 P22 P51 P27 HOURS 90 101 60 109 98 NULL 72

A table is said to be in 3NF when it is in 2NF and every non-key attribute is functionally dependent only on the primary key. Guidelines to convert a table to 3NF: Find and remove non-key attributes that are functionally dependent on attributes that are not primary key. Place them in a different table containing same properties Group the remaining attributes E.g., >> ECODE E101 E305 E402 E508 E607 E608 E104 DEPT Systems Sales Finance Admin Finance Finance Systems DEPT Systems Sales Admin Finance DEPTHEA D E901 E906 E908 E909

Boyce Codd Normal Form A relation is in BCNF only if every determinant is a candidate key. Guidelines to convert a table to BCNF Find and remove the overlapping candidate keys. Place the part of candidate key and the attribute it is functionally dependent on, in another table. Group the remaining items into a table. E.g., >> ECODE E1 E2 E3 E4 E4 E1 NAME Veronica Anthony Mac Susan Susan Veronica PROJCODE P2 P5 P6 P2 P5 P5 HOURS 48 100 15 250 75 40

ECODE E1 E2 E3 E4 E4 E1

PROJCOD E P2 P5 P6 P2 P5 P5

HOURS 48 100 15 250 75 40

You might also like