You are on page 1of 86

As one of the oldest components associated with computers, the

database management system, or DBMS, is a computer software


program that is designed as the means of managing all databases that
are currently installed on a system hard drive or network. Different
types of database management systems exist, with some of them
designed for the oversight and proper control of databases that are
configured for specific purposes. Here are some examples of the
various incarnations of DBMS technology that are currently in use, and
some of the basic elements that are part of DBMS software
applications.

As the tool that is employed in the broad practice of managing


databases, the DBMS is marketed in many forms. Some of the more
popular examples of DBMS solutions include Microsoft Access,
FileMaker, DB2, and Oracle. All these products provide for the creation
of a series of rights or privileges that can be associated with a specific
user. This means that it is possible to designate one or more database
administrators who may control each function, as well as provide other
users with various levels of administration rights. This flexibility makes
the task of using DBMS methods to oversee a system something that
can be centrally controlled, or allocated to several different people.

There are four essential elements that are found with just about every
example of DBMS currently on the market. The first is the
implementation of a modeling language that serves to define the
language of each database that is hosted via the DBMS. There are
several approaches currently in use, with hierarchical, network,
relational, and object examples. Essentially, the modeling language
ensures the ability of the databases to communicate with the DBMS
and thus operate on the system.

Second, data structures also are administered by the DBMS. Examples


of data that are organized by this function are individual profiles or
records, files, fields and their definitions, and objects such as visual
media. Data structures are what allows DBMS to interact with the data
without causing and damage to the integrity of the data itself.

A third component of DBMS software is the data query language. This


element is involved in maintaining the security of the database, by
monitoring the use of login data, the assignment of access rights and
privileges, and the definition of the criteria that must be employed to
add data to the system. The data query language works with the data
structures to make sure it is harder to input irrelevant data into any of
the databases in use on the system.
Last, a mechanism that allows for transactions is an essential basic for
any DBMS. This helps to allow multiple and concurrent access to the
database by multiple users, prevents the manipulation of one record
by two users at the same time, and preventing the creation of
duplicate records.

Database management system

A database management system (DBMS) is computer software


that manages databases. DBMSes may use any of a variety of
database models, such as the network model or relational model. In
large systems, a DBMS allows users and other software to store and
retrieve data in a structured way.

Overview
A DBMS is a set of software programs that controls the organization,
storage, management, and retrieval of data in a database. DBMS are
categorized according to their data structures or types. It is a set of
prewritten programs that are used to store, update and retrieve a
Database. The DBMS accepts requests for data from the application
program and instructs the operating system to transfer the appropriate
data. When a DBMS is used, information systems can be changed
much more easily as the organization's information requirements
change. New categories of data can be added to the database without
disruption to the existing system.

Organizations may use one kind of DBMS for daily transaction


processing and then move the detail onto another computer that uses
another DBMS better suited for random inquiries and analysis. Overall
systems design decisions are performed by data administrators and
systems analysts. Detailed database design is performed by database
administrators.

Database servers are computers that hold the actual databases and
run only the DBMS and related software. Database servers are usually
multiprocessor computers, with generous memory and RAID disk
arrays used for stable storage. Connected to one or more servers via a
high-speed channel, hardware database accelerators are also used in
large volume transaction processing environments. DBMSs are found
at the heart of most database applications. Sometimes DBMSs are
built around a private multitasking kernel with built-in networking
support although nowadays these functions are left to the operating
system.

History
Databases have been in use since the earliest days of electronic
computing. Unlike modern systems which can be applied to widely
different databases and needs, the vast majority of older systems were
tightly linked to the custom databases in order to gain speed at the
expense of flexibility. Originally DBMSs were found only in large
organizations with the computer hardware needed to support large
data sets.

1960s Navigational DBMS

As computers grew in capability, this trade-off became increasingly


unnecessary and a number of general-purpose database systems
emerged; by the mid-1960s there were a number of such systems in
commercial use. Interest in a standard began to grow, and Charles
Bachman, author of one such product, Integrated Data Store (IDS),
founded the "Database Task Group" within CODASYL, the group
responsible for the creation and standardization of COBOL. In 1971
they delivered their standard, which generally became known as the
"Codasyl approach", and soon there were a number of commercial
products based on it available.

The Codasyl approach was based on the "manual" navigation of a


linked data set which was formed into a large network. When the
database was first opened, the program was handed back a link to the
first record in the database, which also contained pointers to other
pieces of data. To find any particular record the programmer had to
step through these pointers one at a time until the required record was
returned. Simple queries like "find all the people in India" required the
program to walk the entire data set and collect the matching results.
There was, essentially, no concept of "find" or "search". This might
sound like a serious limitation today, but in an era when the data was
most often stored on magnetic tape such operations were too
expensive to contemplate anyway.

IBM also had their own DBMS system in 1968, known as IMS. IMS was
a development of software written for the Apollo program on the
System/360. IMS was generally similar in concept to Codasyl, but used
a strict hierarchy for its model of data navigation instead of Codasyl's
network model. Both concepts later became known as navigational
databases due to the way data was accessed, and Bachman's 1973
Turing Award award presentation was The Programmer as Navigator.
IMS is classified as a hierarchical database. IDS and IDMS, both
CODASYL databases, as well as CINCOMs TOTAL database are
classified as network databases.

1970s Relational DBMS

Edgar Codd worked at IBM in San Jose, California, in one of their


offshoot offices that was primarily involved in the development of hard
disk systems. He was unhappy with the navigational model of the
Codasyl approach, notably the lack of a "search" facility which was
becoming increasingly useful. In 1970, he wrote a number of papers
that outlined a new approach to database construction that eventually
culminated in the groundbreaking A Relational Model of Data for Large
Shared Data Banks.[1]

In this paper, he described a new system for storing and working with
large databases. Instead of records being stored in some sort of linked
list of free-form records as in Codasyl, Codd's idea was to use a "table"
of fixed-length records. A linked-list system would be very inefficient
when storing "sparse" databases where some of the data for any one
record could be left empty. The relational model solved this by splitting
the data into a series of normalized tables, with optional elements
being moved out of the main table to where they would take up room
only if needed.
In the relational model, related records are linked together with a
"key".

For instance, a common use of a database system is to track


information about users, their name, login information, various
addresses and phone numbers. In the navigational approach all of
these data would be placed in a single record, and unused items would
simply not be placed in the database. In the relational approach, the
data would be normalized into a user table, an address table and a
phone number table (for instance). Records would be created in these
optional tables only if the address or phone numbers were actually
provided.

Linking the information back together is the key to this system. In the
relational model, some bit of information was used as a "key",
uniquely defining a particular record. When information was being
collected about a user, information stored in the optional (or related)
tables would be found by searching for this key. For instance, if the
login name of a user is unique, addresses and phone numbers for that
user would be recorded with the login name as its key. This "re-
linking" of related data back into a single collection is something that
traditional computer languages are not designed for.

Just as the navigational approach would require programs to loop in


order to collect records, the relational approach would require loops to
collect information about any one record. Codd's solution to the
necessary looping was a set-oriented language, a suggestion that
would later spawn the ubiquitous SQL. Using a branch of mathematics
known as tuple calculus, he demonstrated that such a system could
support all the operations of normal databases (inserting, updating
etc.) as well as providing a simple system for finding and returning
sets of data in a single operation.

Codd's paper was picked up by two people at the Berkeley, Eugene


Wong and Michael Stonebraker. They started a project known as
INGRES using funding that had already been allocated for a
geographical database project, using student programmers to produce
code. Beginning in 1973, INGRES delivered its first test products which
were generally ready for widespread use in 1979. During this time, a
number of people had moved "through" the group — perhaps as many
as 30 people worked on the project, about five at a time. INGRES was
similar to System R in a number of ways, including the use of a
"language" for data access, known as QUEL — QUEL was in fact
relational, having been based on Codd's own Alpha language, but has
since been corrupted to follow SQL, thus violating much the same
concepts of the relational model as SQL itself.

IBM itself did only one test implementation of the relational model,
PRTV, and a production one, Business System 12, both now
discontinued. Honeywell did MRDS for Multics, and now there are two
new implementations: Alphora Dataphor and Rel. All other DBMS
implementations usually called relational are actually SQL DBMSs. In
1968, the University of Michigan began development of the Micro
DBMS relational database management system. It was used to
manage very large data sets by the US Department of Labor, the
Environmental Protection Agency and researchers from University of
Alberta, the University of Michigan and Wayne State University. It ran
on mainframe computers using Michigan Terminal System. The system
remained in production until 1996.

End 1970s SQL DBMS

IBM started working on a prototype system loosely based on Codd's


concepts as System R in the early 1970s. The first "quickie" version
was ready in 1974/5, and work then started on multi-table systems in
which the data could be broken down so that all of the data for a
record (much of which is often optional) did not have to be stored in a
single large "chunk". Subsequent multi-user versions were tested by
customers in 1978 and 1979, by which time a standardized query
language, SQL, had been added. Codd's ideas were establishing
themselves as both workable and superior to Codasyl, pushing IBM to
develop a true production version of System R, known as SQL/DS,
and, later, Database 2 (DB2).
Many of the people involved with INGRES became convinced of the
future commercial success of such systems, and formed their own
companies to commercialize the work but with an SQL interface.
Sybase, Informix, NonStop SQL and eventually Ingres itself were all
being sold as offshoots to the original INGRES product in the 1980s.
Even Microsoft SQL Server is actually a re-built version of Sybase, and
thus, INGRES. Only Larry Ellison's Oracle started from a different
chain, based on IBM's papers on System R, and beat IBM to market
when the first version was released in 1978.

Stonebraker went on to apply the lessons from INGRES to develop a


new database, Postgres, which is now known as PostgreSQL.
PostgreSQL is primarily used for global mission critical applications
(the .org and .info domain name registries use it as their primary data
store, as do many large companies and financial institutions).

In Sweden, Codd's paper was also read and Mimer SQL was developed
from the mid-70s at Uppsala University. In 1984, this project was
consolidated into an independent enterprise. In the early 1980s, Mimer
introduced transaction handling for high robustness in applications, an
idea that was subsequently implemented on most other DBMS.

DBMS building blocks


A DBMS includes four main parts: modeling language, data structure,
database query language, and transaction mechanisms:

Modeling language

A data modeling language to define the schema of each database


hosted in the DBMS, according to the DBMS database model. The four
most common types of organizations are the:

• hierarchical model,
• network model,
• relational model, and
• object model.

Inverted lists and other methods are also used. A given database
management system may provide one or more of the four models. The
optimal structure depends on the natural organization of the
application's data, and on the application's requirements (which
include transaction rate (speed), reliability, maintainability, scalability,
and cost).

The dominant model in use today is the ad hoc one embedded in SQL,
despite the objections of purists who believe this model is a corruption
of the relational model, since it violates several of its fundamental
principles for the sake of practicality and performance. Many DBMSs
also support the Open Database Connectivity API that supports a
standard way for programmers to access the DBMS.

Data structure

Data structures (fields, records, files and objects) optimized to deal


with very large amounts of data stored on a permanent data storage
device (which implies relatively slow access compared to volatile main
memory).

Database query language

A database query language and report writer allows users to


interactively interrogate the database, analyze its data and update it
according to the users privileges on data. It also controls the security
of the database. Data security prevents unauthorized users from
viewing or updating the database. Using passwords, users are allowed
access to the entire database or subsets of it called subschemas. For
example, an employee database can contain all the data about an
individual employee, but one group of users may be authorized to view
only payroll data, while others are allowed access to only work history
and medical data.

If the DBMS provides a way to interactively enter and update the


database, as well as interrogate it, this capability allows for managing
personal databases. However, it may not leave an audit trail of actions
or provide the kinds of controls necessary in a multi-user organization.
These controls are only available when a set of application programs
are customized for each data entry and updating function.

Transaction mechanism

A database transaction mechanism ideally guarantees ACID properties


in order to ensure data integrity despite concurrent user accesses
(concurrency control), and faults (fault tolerance). It also maintains
the integrity of the data in the database. The DBMS can maintain the
integrity of the database by not allowing more than one user to update
the same record at the same time. The DBMS can help prevent
duplicate records via unique index constraints; for example, no two
customers with the same customer numbers (key fields) can be
entered into the database. See ACID properties for more information
(Redundancy avoidance).

DBMS Topics
Logical and physical view

Traditional View of Data[2]

A database management system provides the ability for many different


users to share data and process resources. But as there can be many
different users, there are many different database needs. The question
now is: How can a single, unified database meet the differing
requirement of so many users?

A DBMS minimizes these problems by providing two views of the


database data: a logical (external) view and physical (internal) view.
The logical view/user’s view, of a database program represents data in
a format that is meaningful to a user and to the software programs
that process those data. That is, the logical view tells the user, in user
terms, what is in the database. The physical view deals with the
actual, physical arrangement and location of data in the direct access
storage devices(DASDs). Database specialists use the physical view to
make efficient use of storage and processing resources. With the
logical view users can see data differently from how they are stored,
and they do not want to know all the technical details of physical
storage. After all, a business user is primarily interested in using the
information, not in how it is stored.
One strength of a DBMS is that while there is only one physical view of
the data, there can be an endless number of different logical views.
This feature allows users to see database information in a more
business-related way rather than from a technical, processing
viewpoint. Thus the logical view refers to the way user views data, and
the physical view to the way the data are physically stored and
processed...

DBMS Features and capabilities

Alternatively, and especially in connection with the relational model of


database management, the relation between attributes drawn from a
specified set of domains can be seen as being primary. For instance,
the database might indicate that a car that was originally "red" might
fade to "pink" in time, provided it was of some particular "make" with
an inferior paint job. Such higher arity relationships provide
information on all of the underlying domains at the same time, with
none of them being privileged above the others.

Throughout recent history specialized databases have existed for


scientific, geospatial, imaging, document storage and like uses.
Functionality drawn from such applications has lately begun appearing
in mainstream DBMSs as well. However, the main focus there, at least
when aimed at the commercial data processing market, is still on
descriptive attributes on repetitive record structures.

Thus, the DBMSs of today roll together frequently-needed services or


features of attribute management. By externalizing such functionality
to the DBMS, applications effectively share code with each other and
are relieved of much internal complexity. Features commonly offered
by database management systems include:

Query ability
Querying is the process of requesting attribute information from
various perspectives and combinations of factors. Example:
"How many 2-door cars in Texas are green?" A database query
language and report writer allow users to interactively
interrogate the database, analyze its data and update it
according to the users privileges on data.
Backup and replication
Copies of attributes need to be made regularly in case primary
disks or other equipment fails. A periodic copy of attributes may
also be created for a distant organization that cannot readily
access the original. DBMS usually provide utilities to facilitate the
process of extracting and disseminating attribute sets. When
data is replicated between database servers, so that the
information remains consistent throughout the database system
and users cannot tell or even know which server in the DBMS
they are using, the system is said to exhibit replication
transparency.
Rule enforcement
Often one wants to apply rules to attributes so that the
attributes are clean and reliable. For example, we may have a
rule that says each car can have only one engine associated with
it (identified by Engine Number). If somebody tries to associate
a second engine with a given car, we want the DBMS to deny
such a request and display an error message. However, with
changes in the model specification such as, in this example,
hybrid gas-electric cars, rules may need to change. Ideally such
rules should be able to be added and removed as needed without
significant data layout redesign.
Security
Often it is desirable to limit who can see or change which
attributes or groups of attributes. This may be managed directly
by individual, or by the assignment of individuals and privileges
to groups, or (in the most elaborate models) through the
assignment of individuals and groups to roles which are then
granted entitlements.
Computation
There are common computations requested on attributes such as
counting, summing, averaging, sorting, grouping, cross-
referencing, etc. Rather than have each computer application
implement these from scratch, they can rely on the DBMS to
supply such calculations.
Change and access logging
Often one wants to know who accessed what attributes, what
was changed, and when it was changed. Logging services allow
this by keeping a record of access occurrences and changes.
Automated optimization
If there are frequently occurring usage patterns or requests,
some DBMS can adjust themselves to improve the speed of
those interactions. In some cases the DBMS will merely provide
tools to monitor performance, allowing a human expert to make
the necessary adjustments after reviewing the statistics
collected.

Meta-data repository
Metadata is data describing data. For example, a listing that describes
what attributes are allowed to be in data sets is called "meta-
information". The meta-data is also known as data about data.

Examples of Database Management


Systems
• Computhink's ViewWise • Microsoft SQL Server
• Adabas • Microsoft Visual FoxPro
• Alpha Five • MonetDB
• DataEase • MySQL
• Oracle database • PostgreSQL
• IBM DB2 • Progress
• Adaptive Server Enterprise • SQLite
• FileMaker • Teradata
• Firebird • CSQL
• Ingres • OpenLink Virtuoso
• Informix • Daffodil DB
• Mark Logic • OpenOffice.org Base
• Microsoft Access • Linter SQL RDBMS

• InterSystems Caché • SQL Anywhere

• Column-oriented DBMS
• Data warehouse
• Database-centric architecture
• Directory service
• Distributed database management system
• Document management system
• Hierarchical model

• Navigational database
• Network model
• Object model
• Object database
• Object-relational database
• Real time database
• Associative model of data

• Relational model
• Relational database management system
• Run Book Automation
• Comparison of relational database management systems
• SQL

Computer software
Computer software, or just software is a general term used to
describe a collection of computer programs, procedures and
documentation that perform some tasks on a computer system.[1]

The term includes:

• Application software such as word processors which perform


productive tasks for users.
• Firmware which is software programmed resident to electrically
programmable memory devices on board mainboards or other
types of integrated hardware carriers.
• Middleware which controls and co-ordinates distributed systems.
• System software such as operating systems, which interface with
hardware to provide the necessary services for application
software.
• Software testing is a domain independent of development and
programming. It consists of various methods to test and declare
a software product fit before it can be launched for use by either
an individual or a group. Many tests on functionality,
performance and appearance are conducted by modern testers
with various tools such as QTP, Load runner, Black box testing
etc to edit a checklist of requirements against the developed
code. ISTQB is a certification that is in demand for engineers
who want to pursue a career in testing.[2]
• Testware which is an umbrella term or container term for all
utilities and application software that serve in combination for
testing a software package but not necessarily may optionally
contribute to operational purposes. As such, testware is not a
standing configuration but merely a working environment for
application software or subsets thereof.

Software includes websites, programs, video games, etc. that are


coded by programming languages like C, C++, etc.
"Software" is sometimes used in a broader context to mean anything
which is not hardware but which is used with hardware, such as film,
tapes and records.[3]

Overview
Computer software are often regarded as anything but hardware,
meaning that the "hard" are the parts that are tangible while the "soft"
part is the intangible objects inside the computer. Software
encompasses an extremely wide array of products and technologies
developed using different techniques like programming languages,
scripting languages or even microcode or a FPGA state. The types of
software include web pages developed by technologies like HTML, PHP,
Perl, JSP, ASP.NET, XML, and desktop applications like Microsoft Word,
OpenOffice developed by technologies like C, C++, Java, C#, etc.
Software usually runs on an underlying software operating systems
such as the Microsoft Windows or Linux. Software also includes video
games and the logic systems of modern consumer devices such as
automobiles, televisions, toasters, etc.

Relationship to computer hardware

Computer software is so called to distinguish it from computer


hardware, which encompasses the physical interconnections and
devices required to store and execute (or run) the software. At the
lowest level, software consists of a machine language specific to an
individual processor. A machine language consists of groups of binary
values signifying processor instructions which change the state of the
computer from its preceding state. Software is an ordered sequence of
instructions for changing the state of the computer hardware in a
particular sequence. It is usually written in high-level programming
languages that are easier and more efficient for humans to use (closer
to natural language) than machine language. High-level languages are
compiled or interpreted into machine language object code. Software
may also be written in an assembly language, essentially, a mnemonic
representation of a machine language using a natural language
alphabet. Assembly language must be assembled into object code via
an assembler.

The term "software" was first used in this sense by John W. Tukey in
1958.[4] In computer science and software engineering, computer
software is all computer programs. The theory that is the basis for
most modern software was first proposed by Alan Turing in his 1935
essay Computable numbers with an application to the
Entscheidungsproblem.[5]

Types of software

A layer structure showing where Operating System is located on


generally used software systems on desktops

Practical computer systems divide software systems into three major


classes: system software, programming software and application
software, although the distinction is arbitrary, and often blurred.

System software

System software helps run the computer hardware and computer


system. It includes:

• device drivers,
• operating systems,
• servers,
• utilities,
• Faraware,
• windowing systems,

(these things need not be distinct)

The purpose of systems software is to unburden the applications


programmer from the details of the particular computer complex being
used, including such accessory devices as communications, printers,
readers, displays, keyboards, etc. And also to partition the computer's
resources such as memory and processor time in a safe and stable
manner.

Programming software

Programming software usually provides tools to assist a programmer in


writing computer programs, and software using different programming
languages in a more convenient way. The tools include:

• compilers,
• debuggers,
• interpreters,
• linkers,
• text editors,

An Integrated development environment (IDE) is a single application


that attempts to manage all these functions.

Application software

Application software allows end users to accomplish one or more


specific (not directly computer development related) tasks. Typical
applications include:

• industrial automation,
• business software,
• computer games,
• telecommunications, (ie the internet and everything that flows
on it)
• databases,
• educational software,
• medical software,

Application software exists for and has impacted a wide variety of


topics.

Software topics
Architecture

See also: Software architecture


Users often see things differently than programmers. People who use
modern general purpose computers (as opposed to embedded
systems, analog computers, supercomputers, etc.) usually see three
layers of software performing a variety of tasks: platform, application,
and user software.

• Platform software: Platform includes the firmware, device


drivers, an operating system, and typically a graphical user
interface which, in total, allow a user to interact with the
computer and its peripherals (associated equipment). Platform
software often comes bundled with the computer. On a PC you
will usually have the ability to change the platform software.
• Application software: Application software or Applications are
what most people think of when they think of software. Typical
examples include office suites and video games. Application
software is often purchased separately from computer hardware.
Sometimes applications are bundled with the computer, but that
does not change the fact that they run as independent
applications. Applications are almost always independent
programs from the operating system, though they are often
tailored for specific platforms. Most users think of compilers,
databases, and other "system software" as applications.
• User-written software: End-user development tailors systems to
meet users' specific needs. User software include spreadsheet
templates, word processor macros, scientific simulations, and
scripts for graphics and animations. Even email filters are a kind
of user software. Users create this software themselves and
often overlook how important it is. Depending on how
competently the user-written software has been integrated into
default application packages, many users may not be aware of
the distinction between the original packages, and what has
been added by co-workers.

Documentation

Main article: Software documentation

Most software has software documentation so that the end user can
understand the program, what it does and how to use it. Without a
clear documentation a software can be hard to use and especially if it
is a very specialized and relatively complex software like the
Photoshop, AutoCAD, etc.
Developer documentation may also exist, either with the code as
comments and/or as separate files, detailing how the programs works
and can be modified.

Library

Main article: Software library

A executable is almost always not sufficiently complete for direct


execution. Software libraries include collections of functions and
functionality that may be embedded in other applications. Operating
systems include many standard Software libraries, and applications are
often distributed with their own libraries.

Standard

Main article: Software standard

Since software can be designed using many different programming


languages and in many different operating systems and operating
environments, software standard is needed so that different software
can understand and exchange information between each other. For
instance, an email sent from a Microsoft Outlook should be readable
from Yahoo! Mail and vice versa.

Execution

Main article: Execution (computing)

Computer software has to be "loaded" into the computer's storage


(such as a [hard drive], memory, or RAM). Once the software has
loaded, the computer is able to execute the software. This involves
passing instructions from the application software, through the system
software, to the hardware which ultimately receives the instruction as
machine code. Each instruction causes the computer to carry out an
operation – moving data, carrying out a computation, or altering the
control flow of instructions.

Data movement is typically from one place in memory to another.


Sometimes it involves moving data between memory and registers
which enable high-speed data access in the CPU. Moving data,
especially large amounts of it, can be costly. So, this is sometimes
avoided by using "pointers" to data instead. Computations include
simple operations such as incrementing the value of a variable data
element. More complex computations may involve many operations
and data elements together.

Quality and reliability

Main articles: Software quality, Software testing, and Software


reliability

Software quality is very important, especially for commercial and


system software like Microsoft Office, Microsoft Windows, Linux, etc. If
software is faulty (buggy), it can delete a person's work, crash the
computer and do other unexpected things. Faults and errors are called
"bugs". Many bugs are discovered and eliminated (debugged) through
software testing. However, software testing rarely – if ever –
eliminates every bug; some programmers say that "every program has
at least one more bug" (Lubarsky's Law). All major software
companies, such as Microsoft, Novell and Sun Microsystems, have their
own software testing departments with the specific goal of just testing.
Software can be tested through unit testing, regression testing and
other methods, which are done manually, or most commonly,
automatically, since the amount of code to be tested can be quite
large. For instance, NASA has extremely rigorous software testing
procedures for its Space Shuttle and other programs because faulty
software can crash the whole program and make the vehicle not
functional, at great expense.

License

Main article: Software license

The software's license gives the user the right to use the software in
the licensed environment. Some software comes with the license when
purchased off the shelf, or an OEM license when bundled with
hardware. Other software comes with a free software license, granting
the recipient the rights to modify and redistribute the software.
Software can also be in the form of freeware or shareware. See also
License Management.

Patents

Main articles: Software patent and Software patent debate

Software can be patented; however software patents can be


controversial in the software industry with many people holding
different views about it. The controversy over software patents is that
a specific algorithm or technique that the software has cannot be
duplicated by others and is considered an intellectual property and
copyright infringement depending on the severity. Some people believe
that software patent hinder software development, while others argue
that software patents provide an important incentive to spur software
innovation.

Ethics and rights

Main article: Computer ethics


This section may contain original research or unverified
claims. Please improve the article by adding references. See the
talk page for details. (July 2008)

There is more than one approach to creating, licensing, and


distributing software. For instance, the free software or the open
source community produces software under licensing that makes it
free for inspection of its code, modification of its code, and
distribution. While the software released under an open source license
(such as General Public License, or GPL for short) can be sold for
money,[6] the distribution cannot be restricted in the same way as
software with copyright and patent restrictions (used by corporations
to require licensing fees).

While some advocates of free software use slogans such as


"information wants to be free," hinting that it is easy to copy digital
data and that the licenses (enforced through laws) are unnatural
restrictions, other creators and users of open source software
recognize it to be one model among many for software creation,
licensing, and distribution. And the laws themselves are put into place
for the ostensible purpose of increasing creative output, by allowing
the creators to control and profit most effectively from their intellectual
property.

Design and implementation


Main articles: Software development, Computer programming, and
Software engineering

Design and implementation of a software varies depending on the


complexity of the software. For instance design and creation of
Microsoft Word software will take much longer time than designing and
developing Microsoft Notepad because of the difference in
functionalities in each one.

Software is usually designed and created (coded/written/programmed)


in integrated development environments (IDE) like emacs, xemacs,
Microsoft Visual Studio and Eclipse that can simplify the process and
compile the program. As noted in different section, software is usually
created on top of an existing software and the application
programming interface (API) that the underlying software provides like
GTK+, JavaBeans, Swing etc. Libraries (APIs) are categorized for
different purposes. For instance JavaBeans library is used for designing
enterprise applications, Windows Forms library is used for designing
graphical user interface (GUI) applications like Microsoft Word and
Windows Communication Foundation is used for designing web
services. There are also underlying concepts in computer programming
like quicksort, hashtable, array, binary tree that can be useful to
creating a software. When a program is designed, it relies on the API.
For instance, if a user is designing a Microsoft Windows desktop
application, he/she might use the .NET Windows Forms library to
design the desktop application and call its APIs like Form1.Close() and
Form1.Show() to close or open the application and write the additional
operations him/herself that it need to have. Without these APIs, the
programmer needs to write these APIs him/herself. Companies like
Sun Microsystems, Novell and Microsoft provide their own APIs so that
many applications are written using their software libraries that usually
have numerous APIs in them.

Software has special economic characteristics that make its design,


creation, and distribution different from most other economic
goods.[7][8]

A title of a person that creates a software is called a programmer,


software engineer, software developer and code monkey that all
essentially have a same meaning.

Industry and organizations


Main article: Software industry

Software has its own niche industry that is called the software industry
made up of different entities and peoples that produce software, and
as a result there are many software companies and programmers in
the world. Because software is increasingly used in many different
areas like in finance, searching, mathematics, space exploration,
gaming and mining and such, software companies and people usually
specialize in certain areas. For instance, Electronic Arts primarily
creates video games.

Also selling a software can be quite a profitable industry. For instance,


Bill Gates, the founder of Microsoft is the second richest man in the
world in 2008 largely by selling the Microsoft Windows and Microsoft
Office software programs, and same goes for Larry Ellison largely
through his Oracle database software.

There are also many non-profit software organizations like the Free
Software Foundation, GNU Project, Mozilla Foundation. Also there are
many software standard organizations like the W3C, IETF and others
that try to come up with a software standard so that many software
can work and interoperate with each other like through standards such
as XML, HTML, HTTP, FTP, etc.

Some of the well known software companies include Microsoft, Apple,


IBM, Oracle, Novell, SAP, HP, etc.[9]

See also
Lists Types of software Software portal
Related subjects
• List of basic • Custom
computer Software
• Software as a
programming topics • Free software
Service
• List of computer • Freeware
• Software
programming topics • Open source
development
software
process
• Origins of computer • Proprietary
• Software
terms software
ecosystem
• Scientific
• Software
software
industry
• Shareware
• Software license

Database
Jump to: navigation, search
This article is principally about managing and structuring the
collections of data held on computers. For a fuller discussion of DBMS
software, see Database management system. For databased content
libraries, see Online database

A database is a structured collection of records or data that is stored


in a computer system. The structure is achieved by organizing the data
according to a database model. The model in most common use today
is the relational model. Other models such as the hierarchical model
and the network model use a more explicit representation of
relationships.

Database topics
Architecture

Depending on the intended use, there are a number of database


architectures in use. Many databases use a combination of strategies.
On-line Transaction Processing systems (OLTP) often use a row-
oriented datastore architecture, while data-warehouse and other
retrieval-focused applications like Google's BigTable, or bibliographic
database (library catalogue) systems may use a Column-oriented
DBMS architecture.

Document-Oriented, XML, knowledgebases, as well as frame databases


and RDF-stores (aka triple-stores), may also use a combination of
these architectures in their implementation.

Finally, it should be noted that not all databases have or need a


database 'schema' (so called schema-less databases).

Over many years the database industry has been dominated by


General Purpose database systems, which offer a wide range of
functions that are applicable to many, if not most circumstances in
modern data processing. These have been enhanced with extensible
datatypes, pioneered in the PostgreSQL project, to allow a very wide
range of applications to be developed.
There are also other types of database which cannot be classified as
relational databases.

Database management systems

A computer database relies on software to organize the storage of


data. This software is known as a database management system
(DBMS). Database management systems are categorized according to
the database model that they support. The model tends to determine
the query languages that are available to access the database. A great
deal of the internal engineering of a DBMS, however, is independent of
the data model, and is concerned with managing factors such as
performance, concurrency, integrity, and recovery from hardware
failures. In these areas there are large differences between products.

A Relational Database Management System (RDBMS) implements the


features of the relational model outlined above. In this context, Date's
"Information Principle" states: "the entire information content of the
database is represented in one and only one way. Namely as explicit
values in column positions (attributes) and rows in relations (tuples).
Therefore, there are no explicit pointers between related tables."

Database models

Main article: Database model

Post-relational database models

Products offering a more general data model than the relational model
are sometimes classified as post-relational. The data model in such
products incorporates relations but is not constrained by the
Information Principle, which requires that all information is
represented by data values in relations.

Some of these extensions to the relational model actually integrate


concepts from technologies that pre-date the relational model. For
example, they allow representation of a directed graph with trees on
the nodes.

Some products implementing such models have been built by


extending relational database systems with non-relational features.
Others, however, have arrived in much the same place by adding
relational features to pre-relational systems. Paradoxically, this allows
products that are historically pre-relational, such as PICK and MUMPS,
to make a plausible claim to be post-relational in their current
architecture.

Object database models

In recent years, the object-oriented paradigm has been applied to


database technology, creating a new programming model known as
object databases. These databases attempt to bring the database
world and the application programming world closer together, in
particular by ensuring that the database uses the same type system as
the application program. This aims to avoid the overhead (sometimes
referred to as the impedance mismatch) of converting information
between its representation in the database (for example as rows in
tables) and its representation in the application program (typically as
objects). At the same time, object databases attempt to introduce the
key ideas of object programming, such as encapsulation and
polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a


database. Some products have approached the problem from the
application programming end, by making the objects manipulated by
the program persistent. This also typically requires the addition of
some kind of query language, since conventional programming
languages do not have the ability to find objects based on their
information content. Others have attacked the problem from the
database end, by defining an object-oriented data model for the
database, and defining a database programming language that allows
full programming capabilities as well as traditional query facilities.

Database storage structures

Main article: Database storage structures


This section requires
expansion.

Relational database tables/indexes are typically stored in memory or


on hard disk in one of many forms, ordered/unordered flat files, ISAM,
heaps, hash buckets or B+ trees. These have various advantages and
disadvantages discussed further in the main article on this topic. The
most commonly used are B+ trees and ISAM.

Object databases use a range of storage mechanisms. Some use


virtual memory mapped files to make the native language (C++, Java
etc.) objects persistent. This can be highly efficient but it can make
multi-language access more difficult. Others break the objects down
into fixed and varying length components that are then clustered
tightly together in fixed sized blocks on disk and reassembled into the
appropriate format either for the client or in the client address space.
Another popular technique is to store the objects in tuples, much like a
relational database, which the database server then reassembles for
the client.

Other important design choices relate to the clustering of data by


category (such as grouping data by month, or location), creating pre-
computed views known as materialized views, partitioning data by
range or hash. Memory management and storage topology can be
important design choices for database designers as well. Just as
normalization is used to reduce storage requirements and improve the
extensibility of the database, conversely denormalization is often used
to reduce join complexity and reduce execution time for queries.[1]

Indexing

All of these databases can take advantage of indexing to increase their


speed. This technology has advanced tremendously since its early uses
in the 1960s and 1970s. The most common kind of index is a sorted
list of the contents of some particular table column, with pointers to
the row associated with the value. An index allows a set of table rows
matching some criterion to be located quickly. Typically, indexes are
also stored in the various forms of data-structure mentioned above
(such as B-trees, hashes, and linked lists). Usually, a specific
technique is chosen by the database designer to increase efficiency in
the particular case of the type of index required.

Most relational DBMS's and some object DBMSs have the advantage
that indexes can be created or dropped without changing existing
applications making use of it. The database chooses between many
different strategies based on which one it estimates will run the
fastest. In other words, indexes are transparent to the application or
end-user querying the database; while they affect performance, any
SQL command will run with or without index to compute the result of
an SQL statement. The RDBMS will produce a plan of how to execute
the query, which is generated by analyzing the run times of the
different algorithms and selecting the quickest. Some of the key
algorithms that deal with joins are nested loop join, sort-merge join
and hash join. Which of these is chosen depends on whether an index
exists, what type it is, and its cardinality.
An index speeds up access to data, but it has disadvantages as well.
First, every index increases the amount of storage on the hard drive
necessary for the database file, and second, the index must be
updated each time the data are altered, and this costs time. (Thus an
index saves time in the reading of data, but it costs time in entering
and altering data. It thus depends on the use to which the data are to
be put whether an index is on the whole a net plus or minus in the
quest for efficiency.)

A special case of an index is a primary index, or primary key, which is


distinguished in that the primary index must ensure a unique reference
to a record. Often, for this purpose one simply uses a running index
number (ID number). Primary indexes play a significant role in
relational databases, and they can speed up access to data
considerably.

Transactions and concurrency

In addition to their data model, most practical databases


("transactional databases") attempt to enforce a database transaction.
Ideally, the database software should enforce the ACID rules,
summarized here:

• Atomicity: Either all the tasks in a transaction must be done, or


none of them. The transaction must be completed, or else it
must be undone (rolled back).
• Consistency: Every transaction must preserve the integrity
constraints — the declared consistency rules — of the database.
It cannot place the data in a contradictory state.
• Isolation: Two simultaneous transactions cannot interfere with
one another. Intermediate results within a transaction are not
visible to other transactions.
• Durability: Completed transactions cannot be aborted later or
their results discarded. They must persist through (for instance)
restarts of the DBMS after crashes

In practice, many DBMSs allow most of these rules to be selectively


relaxed for better performance.

Concurrency control is a method used to ensure that transactions are


executed in a safe manner and follow the ACID rules. The DBMS must
be able to ensure that only serializable, recoverable schedules are
allowed, and that no actions of committed transactions are lost while
undoing aborted transactions.
Replication

Replication of databases is closely related to transactions. If a


database can log its individual actions, it is possible to create a
duplicate of the data in real time. The duplicate can be used to
improve performance or availability of the whole database system.
Common replication concepts include:

• Master/Slave Replication: All write requests are performed on


the master and then replicated to the slaves
• Quorum: The result of Read and Write requests are calculated by
querying a "majority" of replicas.
• Multimaster: Two or more replicas sync each other via a
transaction identifier.

Parallel synchronous replication of databases enables transactions to


be replicated on multiple servers simultaneously, which provides a
method for backup and security as well as data availability.

Security

Database security denotes the system, processes, and procedures that


protect a database from unintended activity.

Security is usually enforced through access control, auditing, and


encryption.

• Access control ensures and restricts who can connect and what
can be done to the database.
• Auditing logs what action or change has been performed, when
and by whom.
• Encryption: Since security has become a major issue in recent
years, many commercial database vendors provide built-in
encryption mechanisms. Data is encoded natively into the tables
and deciphered "on the fly" when a query comes in. Connections
can also be secured and encrypted if required using DSA, MD5,
SSL or legacy encryption standard.

Enforcing security is one of the major tasks of the DBA.

In the United Kingdom, legislation protecting the public from


unauthorized disclosure of personal information held on databases falls
under the Office of the Information Commissioner. United Kingdom
based organizations holding personal data in electronic format
(databases for example) are required to register with the Data
Commissioner.[2]

Locking

This section requires


expansion.

Locking is how the database handles multiple concurrent operations.


This is how concurrency and some form of basic integrity is managed
within the database system. Such locks can be applied on a row level,
or on other levels like page (a basic data block), extent (multiple array
of pages) or even an entire table. This helps maintain the integrity of
the data by ensuring that only one process at a time can modify the
same data.

In basic filesystem files or folders, only one lock at a time can be set,
restricting the usage to one process only. Databases, on the other
hand, can set and hold mutiple locks at the same time on the different
level of the physical data structure. How locks are set, last is
determined by the database engine locking scheme based on the
submitted SQL or transactions by the users. Generally speaking, no
activity on the database should be translated by no or very light
locking.

For most DBMS systems existing on the market, locks are generally
shared or exclusive. Exclusive locks mean that no other lock can
acquire the current data object as long as the exclusive lock lasts.
Exclusive locks are usually set while the database needs to change
data, like during an UPDATE or DELETE operation.

Shared locks can take ownership one from the other of the current
data structure. Shared locks are usually used while the database is
reading data, during a SELECT operation. The number, nature of locks
and time the lock holds a data block can have a huge impact on the
database performances. Bad locking can lead to disastrous
performance response (usually the result of poor SQL requests, or
inadequate database physical structure)

Default locking behavior is enforced by the isolation level of the data


server. Changing the isolation level will affect how shared or exclusive
locks must be set on the data for the entire database system. Default
isolation is generally 1, where data can not be read while it is
modified, forbidding to return "ghost data" to end user.

At some point intensive or inappropriate exclusive locking, can lead to


the "dead lock" situation between two locks. Where none of the locks
can be released because they try to acquire resources mutually from
each other. The Database has a fail safe mechanism and will
automatically "sacrifice" one of the locks releasing the resource. Doing
so processes or transactions involved in the "dead lock" will be rolled
back.

Databases can also be locked for other reasons, like access restrictions
for given levels of user. Some databases are also locked for routine
database maintenance, which prevents changes being made during the
maintenance. See "Locking tables and databases" (section in some
documentation / explanation from IBM) for more detail.) However,
many modern databases don't lock the database during routine
maintenance. e.g. "Routine Database Maintenance" for PostgreSQL.

Applications of databases
Databases are used in many applications, spanning virtually the entire
range of computer software. Databases are the preferred method of
storage for large multiuser applications, where coordination between
many users is needed. Even individual users find them convenient, and
many electronic mail programs and personal organizers are based on
standard database technology. Software database drivers are available
for most database platforms so that application software can use a
common Application Programming Interface to retrieve the information
stored in a database. Two commonly used database APIs are JDBC and
ODBC.

See also
• Comparison of relational database management systems
• Comparison of database tools
• Database-centric architecture
• Database theory
• Government database
• Object database
• Online database
• Real time database
• Relational database
Database model
From Wikipedia, the free encyclopedia

Jump to: navigation, search

A database model or database schema is the structure or format of a


database, described in a formal language supported by the database
management system. Schemas are generally stored in a data
dictionary.

Collage of five types of database models.

Although a schema is defined in text database language, the term is


often used to refer to a graphical depiction of the database structure.[1]

Overview
A database model is a theory or specification describing how a
database is structured and used. Several such models have been
suggested.
Common models include:

• Hierarchical model
• Network model
• Relational model
• Entity-relationship
• Object-relational model
• Object model

A data model is not just a way of structuring data: it also defines a set
of operations that can be performed on the data. The relational model,
for example, defines operations such as select, project, and join.
Although these operations may not be explicit in a particular query
language, they provide the foundation on which a query language is
built.

Models
Various techniques are used to model data structure. Most database
systems are built around one particular data model, although it is
increasingly common for products to offer support for more than one
model. For any one logical model various physical implementations
may be possible, and most products will offer the user some level of
control in tuning the physical implementation, since the choices that
are made have a significant effect on performance. An example of this
is the relational model: all serious implementations of the relational
model allow the creation of indexes which provide fast access to rows
in a table if the values of certain columns are known.

Flat model

Flat File Model.[1]


This may not strictly qualify as a data model, as defined above.

The flat (or table) model consists of a single, two-dimensional array of


data elements, where all members of a given column are assumed to
be similar values, and all members of a row are assumed to be related
to one another. For instance, columns for name and password that
might be used as a part of a system security database. Each row
would have the specific password associated with an individual user.
Columns of the table often have a type associated with them, defining
them as character data, date or time information, integers, or floating
point numbers.

Hierarchical model

Hierarchical Model.[1]
Main article: Hierarchical model

In a hierarchical model, data is organized into a tree-like structure,


implying a single upward link in each record to describe the nesting,
and a sort field to keep the records in a particular order in each same-
level list. Hierarchical structures were widely used in the early
mainframe database management systems, such as the Information
Management System (IMS) by IBM, and now describe the structure of
XML documents. This structure allows one 1:N relationship between
two types of data. This structure is very efficient to describe many
relationships in the real world; recipes, table of contents, ordering of
paragraphs/verses, any nested and sorted information. However, the
hierarchical structure is inefficient for certain database operations
when a full path (as opposed to upward link and sort field) is not also
included for each record.

One limitation of the hierarchical model is its inability to efficiently


represent redundancy in data. Entity-Attribute-Value database models
like Caboodle by Swink are based on this structure.

Parent–child relationship: Child may only have one parent but a parent
can have multiple children. Parents and children are tied together by
links called "pointers“. A parent will have a list of pointers to each of
their children.

Network model
Network Model.[1]
Main article: Network model

The network model (defined by the CODASYL specification) organizes


data using two fundamental constructs, called records and sets.
Records contain fields (which may be organized hierarchically, as in the
programming language COBOL). Sets (not to be confused with
mathematical sets) define one-to-many relationships between records:
one owner, many members. A record may be an owner in any number
of sets, and a member in any number of sets.

The network model is a variation on the hierarchical model, to the


extent that it is built on the concept of multiple branches (lower-level
structures) emanating from one or more nodes (higher-level
structures), while the model differs from the hierarchical model in that
branches can be connected to multiple nodes. The network model is
able to represent redundancy in data more efficiently than in the
hierarchical model.

The operations of the network model are navigational in style: a


program maintains a current position, and navigates from one record
to another by following the relationships in which the record
participates. Records can also be located by supplying key values.

Although it is not an essential feature of the model, network databases


generally implement the set relationships by means of pointers that
directly address the location of a record on disk. This gives excellent
retrieval performance, at the expense of operations such as database
loading and reorganization.

Most object databases use the navigational concept to provide fast


navigation across networks of objects, generally using object
identifiers as "smart" pointers to related objects. Objectivity/DB, for
instance, implements named 1:1, 1:many, many:1 and many:many
named relationships that can cross databases. Many object databases
also support SQL, combining the strengths of both models.

Relational model

Example of a Relational Model.[1]

The relational model was introduced by E. F. Codd in 1970[2] as a way


to make database management systems more independent of any
particular application. It is a mathematical model defined in terms of
predicate logic and set theory.

The products that are generally referred to as relational databases in


fact implement a model that is only an approximation to the
mathematical model defined by Codd. Three key terms are used
extensively in relational database models: relations, attributes, and
domains. A relation is a table with columns and rows. The named
columns of the relation are called attributes, and the domain is the set
of values the attributes are allowed to take.

The basic data structure of the relational model is the table, where
information about a particular entity (say, an employee) is represented
in columns and rows (also called tuples). Thus, the "relation" in
"relational database" refers to the various tables in the database; a
relation is a set of tuples. The columns enumerate the various
attributes of the entity (the employee's name, address or phone
number, for example), and a row is an actual instance of the entity (a
specific employee) that is represented by the relation. As a result,
each tuple of the employee table represents various attributes of a
single employee.
All relations (and, thus, tables) in a relational database have to adhere
to some basic rules to qualify as relations. First, the ordering of
columns is immaterial in a table. Second, there can't be identical
tuples or rows in a table. And third, each tuple will contain a single
value for each of its attributes.

A relational database contains multiple tables, each similar to the one


in the "flat" database model. One of the strengths of the relational
model is that, in principle, any value occurring in two different records
(belonging to the same table or to different tables), implies a
relationship among those two records. Yet, in order to enforce explicit
integrity constraints, relationships between records in tables can also
be defined explicitly, by identifying or non-identifying parent-child
relationships characterized by assigning cardinality (1:1, (0)1:M,
M:M). Tables can also have a designated single attribute or a set of
attributes that can act as a "key", which can be used to uniquely
identify each tuple in the table.

A key that can be used to uniquely identify a row in a table is called a


primary key. Keys are commonly used to join or combine data from
two or more tables. For example, an Employee table may contain a
column named Location which contains a value that matches the key
of a Location table. Keys are also critical in the creation of indexes,
which facilitate fast retrieval of data from large tables. Any column can
be a key, or multiple columns can be grouped together into a
compound key. It is not necessary to define all the keys in advance; a
column can be used as a key even if it was not originally intended to
be one.

A key that has an external, real-world meaning (such as a person's


name, a book's ISBN, or a car's serial number) is sometimes called a
"natural" key. If no natural key is suitable (think of the many people
named Brown), an arbitrary or surrogate key can be assigned (such as
by giving employees ID numbers). In practice, most databases have
both generated and natural keys, because generated keys can be used
internally to create links between rows that cannot break, while
natural keys can be used, less reliably, for searches and for integration
with other databases. (For example, records in two independently
developed databases could be matched up by social security number,
except when the social security numbers are incorrect, missing, or
have changed.)

Dimensional model
The dimensional model is a specialized adaptation of the relational
model used to represent data in data warehouses in a way that data
can be easily summarized using OLAP queries. In the dimensional
model, a database consists of a single large table of facts that are
described using dimensions and measures. A dimension provides the
context of a fact (such as who participated, when and where it
happened, and its type) and is used in queries to group related facts
together. Dimensions tend to be discrete and are often hierarchical; for
example, the location might include the building, state, and country. A
measure is a quantity describing the fact, such as revenue. It's
important that measures can be meaningfully aggregated - for
example, the revenue from different locations can be added together.

In an OLAP query, dimensions are chosen and the facts are grouped
and added together to create a summary.

The dimensional model is often implemented on top of the relational


model using a star schema, consisting of one table containing the facts
and surrounding tables containing the dimensions. Particularly
complicated dimensions might be represented using multiple tables,
resulting in a snowflake schema.

A data warehouse can contain multiple star schemas that share


dimension tables, allowing them to be used together. Coming up with a
standard set of dimensions is an important part of dimensional
modeling.

Object database models

Example of a Object-Oriented Model.[1]


Main article: Object-relational model
Main article: Object model
In recent years, the object-oriented paradigm has been applied to
database technology, creating a new programming model known as
object databases. These databases attempt to bring the database
world and the application programming world closer together, in
particular by ensuring that the database uses the same type system as
the application program. This aims to avoid the overhead (sometimes
referred to as the impedance mismatch) of converting information
between its representation in the database (for example as rows in
tables) and its representation in the application program (typically as
objects). At the same time, object databases attempt to introduce the
key ideas of object programming, such as encapsulation and
polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a


database. Some products have approached the problem from the
application programming end, by making the objects manipulated by
the program persistent. This also typically requires the addition of
some kind of query language, since conventional programming
languages do not have the ability to find objects based on their
information content. Others have attacked the problem from the
database end, by defining an object-oriented data model for the
database, and defining a database programming language that allows
full programming capabilities as well as traditional query facilities.

Object databases suffered because of a lack of standardization:


although standards were defined by ODMG, they were never
implemented well enough to ensure interoperability between products.
Nevertheless, object databases have been used successfully in many
applications: usually specialized applications such as engineering
databases or molecular biology databases rather than mainstream
commercial data processing. However, object database ideas were
picked up by the relational vendors and influenced extensions made to
these products and indeed to the SQL language.

See also
• Associative
• Concept-oriented
• Entity-attribute-value
• Information model
• Multi-dimensional model
• Semantic data model
• Semi-structured
• Star schema
• XML database

Network model
The network model is a database model conceived as a flexible way
of representing objects and their relationships.

Example of a Network Model.

The network model's original inventor was Charles Bachman, and it


was developed into a standard specification published in 1969 by the
CODASYL Consortium.

Overview
Where the hierarchical model structures data as a tree of records, with
each record having one parent record and many children, the network
model allows each record to have multiple parent and child records,
forming a lattice structure.

The chief argument in favour of the network model, in comparison to


the hierarchic model, was that it allowed a more natural modeling of
relationships between entities. Although the model was widely
implemented and used, it failed to become dominant for two main
reasons. Firstly, IBM chose to stick to the hierarchical model with semi-
network extensions in their established products such as IMS and DL/I.
Secondly, it was eventually displaced by the relational model, which
offered a higher-level, more declarative interface. Until the early 1980s
the performance benefits of the low-level navigational interfaces
offered by hierarchical and network databases were persuasive for
many large-scale applications, but as hardware became faster, the
extra productivity and flexibility of the relational model led to the
gradual obsolescence of the network model in corporate enterprise
usage.

Some Well-known Network Databases

• TurboIMAGE
• IDMS(Integrated Database Management System)
• RDM Embedded
• RDM Server

History
In 1969, the Conference on Data Systems Languages (CODASYL)
established the first specification of the network database model. This
was followed by a second publication in 1971, which became the basis
for most implementations. Subsequent work continued into the early
1980s, culminating in an ISO specification, but this had little influence
on products.

See also
• CODASYL
• Navigational database
• Semantic Web

Relational model
The relational model for database management is a database model
based on first-order predicate logic, first formulated and proposed in
1969 by E.F. Codd [1][2][3]
Example of a Relational model.[4]

Overview
Its core idea is to describe a database as a collection of predicates
over a finite set of predicate variables, describing constraints on the
possible values and combinations of values. The content of the
database at any given time is a finite (logical) model of the database,
i.e. a set of relations, one per predicate variable, such that all
predicates are satisfied. A request for information from the database
(a database query) is also a predicate.

Relational model concepts.


In the relational model, related records are linked together with a
"key".

The purpose of the relational model is to provide a declarative method


for specifying data and queries: we directly state what information the
database contains and what information we want from it, and let the
database management system software take care of describing data
structures for storing the data and retrieval procedures for getting
queries answered.

IBM's original implementation of Codd's ideas was System R. There


have been several commercial and open source products based on
Codd's ideas, including IBM's DB2, Oracle Database, Microsoft SQL
Server, PostgreSQL, MySQL, and many others. Most of these use the
SQL data definition and query language. A table in an SQL database
schema corresponds to a predicate variable; the contents of a table to
a relation; key constraints, other constraints, and SQL queries
correspond to predicates. However, it must be noted that SQL
databases, including DB2, deviate from the relational model in many
details; Codd fiercely argued against deviations that compromise the
original principles[5].

Alternatives to the relational model

Other models are the hierarchical model and network model. Some
systems using these older architectures are still in use today in data
centers with high data volume needs or where existing systems are so
complex and abstract it would be cost prohibitive to migrate to
systems employing the relational model; also of note are newer
object-oriented databases.
A recent development is the Object-Relation type-Object model, which
is based on the assumption that any fact can be expressed in the form
of one or more binary relationships. The model is used in Object Role
Modeling (ORM), RDF/Notation 3 (N3) and in Gellish English.

The relational model was the first formal database model. After it was
defined, informal models were made to describe hierarchical databases
(the hierarchical model) and network databases (the network model).
Hierarchical and network databases existed before relational
databases, but were only described as models after the relational
model was defined, in order to establish a basis for comparison.

Implementation

There have been several attempts to produce a true implementation of


the relational database model as originally defined by Codd and
explained by Date, Darwen and others, but none have been popular
successes so far. Rel is one of the more recent attempts to do this.

History
The relational model was invented by E.F. (Ted) Codd as a general
model of data, and subsequently maintained and developed by Chris
Date and Hugh Darwen among others. In The Third Manifesto (first
published in 1995) Date and Darwen show how the relational model
can accommodate certain desired object-oriented features.

Controversies

Codd himself, some years after publication of his 1970 model,


proposed a three-valued logic (True, False, Missing or NULL) version of
it in order to deal with missing information, and in his The Relational
Model for Database Management Version 2 (1990) he went a step
further with a four-valued logic (True, False, Missing but Applicable,
Missing but Inapplicable) version. But these have never been
implemented, presumably because of attending complexity. SQL's
NULL construct was intended to be part of a three-valued logic system,
but fell short of that due to logical errors in the standard and in its
implementations.

Relational model topics


The model
The fundamental assumption of the relational model is that all data is
represented as mathematical n-ary relations, an n-ary relation being
a subset of the Cartesian product of n domains. In the mathematical
model, reasoning about such data is done in two-valued predicate
logic, meaning there are two possible evaluations for each proposition:
either true or false (and in particular no third value such as unknown,
or not applicable, either of which are often associated with the concept
of NULL). Some think two-valued logic is an important part of the
relational model, while others think a system that uses a form of
three-valued logic can still be considered relational.[citation needed][who?]

Data are operated upon by means of a relational calculus or relational


algebra, these being equivalent in expressive power.

The relational model of data permits the database designer to create a


consistent, logical representation of information. Consistency is
achieved by including declared constraints in the database design,
which is usually referred to as the logical schema. The theory includes
a process of database normalization whereby a design with certain
desirable properties can be selected from a set of logically equivalent
alternatives. The access plans and other implementation and operation
details are handled by the DBMS engine, and are not reflected in the
logical model. This contrasts with common practice for SQL DBMSs in
which performance tuning often requires changes to the logical model.

The basic relational building block is the domain or data type, usually
abbreviated nowadays to type. A tuple is an unordered set of
attribute values. An attribute is an ordered pair of attribute name
and type name. An attribute value is a specific valid value for the type
of the attribute. This can be either a scalar value or a more complex
type.

A relation consists of a heading and a body. A heading is a set of


attributes. A body (of an n-ary relation) is a set of n-tuples. The
heading of the relation is also the heading of each of its tuples.

A relation is defined as a set of n-tuples. In both mathematics and the


relational database model, a set is an unordered collection of items,
although some DBMSs impose an order to their data. In mathematics,
a tuple has an order, and allows for duplication. E.F. Codd originally
defined tuples using this mathematical definition.[6] Later, it was one of
E.F. Codd's great insights that using attribute names instead of an
ordering would be so much more convenient (in general) in a
computer language based on relations[citation needed]. This insight is still
being used today. Though the concept has changed, the name "tuple"
has not. An immediate and important consequence of this
distinguishing feature is that in the relational model the Cartesian
product becomes commutative.

A table is an accepted visual representation of a relation; a tuple is


similar to the concept of row, but note that in the database language
SQL the columns and the rows of a table are ordered.[citation needed]

A relvar is a named variable of some specific relation type, to which at


all times some relation of that type is assigned, though the relation
may contain zero tuples.

The basic principle of the relational model is the Information Principle:


all information is represented by data values in relations. In
accordance with this Principle, a relational database is a set of relvars
and the result of every query is presented as a relation.

The consistency of a relational database is enforced, not by rules built


into the applications that use it, but rather by constraints, declared as
part of the logical schema and enforced by the DBMS for all
applications. In general, constraints are expressed using relational
comparison operators, of which just one, "is subset of" (⊆), is
theoretically sufficient. In practice, several useful shorthands are
expected to be available, of which the most important are candidate
key (really, superkey) and foreign key constraints.

Interpretation

To fully appreciate the relational model of data it is essential to


understand the intended interpretation of a relation.

The body of a relation is sometimes called its extension. This is


because it is to be interpreted as a representation of the extension of
some predicate, this being the set of true propositions that can be
formed by replacing each free variable in that predicate by a name (a
term that designates something).

There is a one-to-one correspondence between the free variables of


the predicate and the attribute names of the relation heading. Each
tuple of the relation body provides attribute values to instantiate the
predicate by substituting each of its free variables. The result is a
proposition that is deemed, on account of the appearance of the tuple
in the relation body, to be true. Contrariwise, every tuple whose
heading conforms to that of the relation but which does not appear in
the body is deemed to be false. This assumption is known as the
closed world assumption.

For a formal exposition of these ideas, see the section Set Theory
Formulation, below.

Application to databases

A type as used in a typical relational database might be the set of


integers, the set of character strings, the set of dates, or the two
boolean values true and false, and so on. The corresponding type
names for these types might be the strings "int", "char", "date",
"boolean", etc. It is important to understand, though, that relational
theory does not dictate what types are to be supported; indeed,
nowadays provisions are expected to be available for user-defined
types in addition to the built-in ones provided by the system.

Attribute is the term used in the theory for what is commonly


referred to as a column. Similarly, table is commonly used in place of
the theoretical term relation (though in SQL the term is by no means
synonymous with relation). A table data structure is specified as a list
of column definitions, each of which specifies a unique column name
and the type of the values that are permitted for that column. An
attribute value is the entry in a specific column and row, such as
"John Doe" or "35".

A tuple is basically the same thing as a row, except in an SQL DBMS,


where the column values in a row are ordered. (Tuples are not
ordered; instead, each attribute value is identified solely by the
attribute name and never by its ordinal position within the tuple.) An
attribute name might be "name" or "age".

A relation is a table structure definition (a set of column definitions)


along with the data appearing in that structure. The structure
definition is the heading and the data appearing in it is the body, a
set of rows. A database relvar (relation variable) is commonly known
as a base table. The heading of its assigned value at any time is as
specified in the table declaration and its body is that most recently
assigned to it by invoking some update operator (typically, INSERT,
UPDATE, or DELETE). The heading and body of the table resulting from
evaluation of some query are determined by the definitions of the
operators used in the expression of that query. (Note that in SQL the
heading is not always a set of column definitions as described above,
because it is possible for a column to have no name and also for two
or more columns to have the same name. Also, the body is not always
a set of rows because in SQL it is possible for the same row to appear
more than once in the same body.)

SQL and the relational model

SQL, initially pushed as the standard language for relational


databases, deviates from the relational model in several places. The
current ISO SQL standard doesn't mention the relational model or use
relational terms or concepts. However, it is possible to create a
database conforming to the relational model using SQL if one does not
use certain SQL features.

The following deviations from the relational model have been noted in
SQL. Note that few database servers implement the entire SQL
standard and in particular do not allow some of these deviations.
Whereas NULL is ubiquitous, for example, allowing duplicate column
names within a table or anonymous columns is uncommon.

Duplicate rows
The same row can appear more than once in an SQL table. The
same tuple cannot appear more than once in a relation.
Anonymous columns
A column in an SQL table can be unnamed and thus unable to be
referenced in expressions. The relational model requires every
attribute to be named and referenceable.
Duplicate column names
Two or more columns of the same SQL table can have the same
name and therefore cannot be referenced, on account of the
obvious ambiguity. The relational model requires every attribute
to be referenceable.
Column order significance
The order of columns in an SQL table is defined and significant,
one consequence being that SQL's implementations of Cartesian
product and union are both noncommutative. The relational
model requires there to be no significance to any ordering of the
attributes of a relation.
Views without CHECK OPTION
Updates to a view defined without CHECK OPTION can be
accepted but the resulting update to the database does not
necessarily have the expressed effect on its target. For example,
an invocation of INSERT can be accepted but the inserted rows
might not all appear in the view, or an invocation of UPDATE can
result in rows disappearing from the view. The relational model
requires updates to a view to have the same effect as if the view
were a base relvar.
Columnless tables unrecognized
SQL requires every table to have at least one column, but there
are two relations of degree zero (of cardinality one and zero) and
they are needed to represent extensions of predicates that
contain no free variables.
NULL
This special mark can appear instead of a value wherever a value
can appear in SQL, in particular in place of a column value in
some row. The deviation from the relational model arises from
the fact that the implementation of this ad hoc concept in SQL
involves the use of three-valued logic, under which the
comparison of NULL with itself does not yield true but instead
yields the third truth value, unknown; similarly the comparison
NULL with something other than itself does not yield false but
instead yields unknown. It is because of this behaviour in
comparisons that NULL is described as a mark rather than a
value. The relational model depends on the law of excluded
middle under which anything that is not true is false and
anything that is not false is true; it also requires every tuple in a
relation body to have a value for every attribute of that relation.
This particular deviation is disputed by some if only because E.F.
Codd himself eventually advocated the use of special marks and
a 4-valued logic, but this was based on his observation that
there are two distinct reasons why one might want to use a
special mark in place of a value, which led opponents of the use
of such logics to discover more distinct reasons and at least as
many as 19 have been noted, which would require a 21-valued
logic. SQL itself uses NULL for several purposes other than to
represent "value unknown". For example, the sum of the empty
set is NULL, meaning zero, the average of the empty set is NULL,
meaning undefined, and NULL appearing in the result of a LEFT
JOIN can mean "no value because there is no matching row in
the right-hand operand".
Concepts
SQL uses concepts "table", "column", "row" instead of "relvar",
"attribute", "tuple". These are not merely differences in
terminology. For example, a "table" may contain duplicate rows,
whereas the same tuple cannot appear more than once in a
relation.

Relational operations
Users (or programs) request data from a relational database by
sending it a query that is written in a special language, usually a
dialect of SQL. Although SQL was originally intended for end-users, it
is much more common for SQL queries to be embedded into software
that provides an easier user interface. Many web sites, such as
Wikipedia, perform SQL queries when generating pages.

In response to a query, the database returns a result set, which is just


a list of rows containing the answers. The simplest query is just to
return all the rows from a table, but more often, the rows are filtered
in some way to return just the answer wanted.

Often, data from multiple tables are combined into one, by doing a
join. Conceptually, this is done by taking all possible combinations of
rows (the Cartesian product), and then filtering out everything except
the answer. In practice, relational database management systems
rewrite ("optimize") queries to perform faster, using a variety of
techniques.

There are a number of relational operations in addition to join. These


include project (the process of eliminating some of the columns),
restrict (the process of eliminating some of the rows), union (a way of
combining two tables with similar structures), difference (which lists
the rows in one table that are not found in the other), intersect (which
lists the rows found in both tables), and product (mentioned above,
which combines each row of one table with each row of the other).
Depending on which other sources you consult, there are a number of
other operators - many of which can be defined in terms of those listed
above. These include semi-join, outer operators such as outer join and
outer union, and various forms of division. Then there are operators to
rename columns, and summarizing or aggregating operators, and if
you permit relation values as attributes (RVA - relation-valued
attribute), then operators such as group and ungroup. The SELECT
statement in SQL serves to handle all of these except for the group
and ungroup operators.

The flexibility of relational databases allows programmers to write


queries that were not anticipated by the database designers. As a
result, relational databases can be used by multiple applications in
ways the original designers did not foresee, which is especially
important for databases that might be used for a long time (perhaps
several decades). This has made the idea and implementation of
relational databases very popular with businesses.
Database normalization

Main article: Database normalization

Relations are classified based upon the types of anomalies to which


they're vulnerable. A database that's in the first normal form is
vulnerable to all types of anomalies, while a database that's in the
domain/key normal form has no modification anomalies. Normal forms
are hierarchical in nature. That is, the lowest level is the first normal
form, and the database cannot meet the requirements for higher level
normal forms without first having met all the requirements of the
lesser normal forms.[7]

Examples
Database

An idealized, very simple example of a description of some relvars and


their attributes:

• Customer(Customer ID, Tax ID, Name, Address, City, State,


Zip, Phone)
• Order(Order No, Customer ID, Invoice No, Date Placed, Date
Promised, Terms, Status)
• Order Line(Order No, Order Line No, Product Code, Qty)
• Invoice(Invoice No, Customer ID, Order No, Date, Status)
• Invoice Line(Invoice No, Invoice Line No, Product Code, Qty
Shipped)
• Product(Product Code, Product Description)

In this design we have six relvars: Customer, Order, Order Line,


Invoice, Invoice Line and Product. The bold, underlined attributes are
candidate keys. The non-bold, underlined attributes are foreign keys.

Usually one candidate key is arbitrarily chosen to be called the primary


key and used in preference over the other candidate keys, which are
then called alternate keys.

A candidate key is a unique identifier enforcing that no tuple will be


duplicated; this would make the relation into something else, namely a
bag, by violating the basic definition of a set. Both foreign keys and
superkeys (which includes candidate keys) can be composite, that is,
can be composed of several attributes. Below is a tabular depiction of
a relation of our example Customer relvar; a relation can be thought of
as a value that can be attributed to a relvar.

Customer relation
Customer ID Tax ID Name Address [More fields....]
====================================================
==============================================
1234567890 555-5512222 Munmun 323 Broadway ...
2223344556 555-5523232 SS4 Vegeta 1200 Main Street ...
3334445563 555-5533323 Ekta 871 1st Street ...
4232342432 555-5325523 E. F. Codd 123 It Way ...

If we attempted to insert a new customer with the ID 1234567890,


this would violate the design of the relvar since Customer ID is a
primary key and we already have a customer 1234567890. The DBMS
must reject a transaction such as this that would render the database
inconsistent by a violation of an integrity constraint.

Foreign keys are integrity constraints enforcing that the value of the
attribute set is drawn from a candidate key in another relation. For
example in the Order relation the attribute Customer ID is a foreign
key. A join is the operation that draws on information from several
relations at once. By joining relvars from the example above we could
query the database for all of the Customers, Orders, and Invoices. If
we only wanted the tuples for a specific customer, we would specify
this using a restriction condition.

If we wanted to retrieve all of the Orders for Customer 1234567890,


we could query the database to return every row in the Order table
with Customer ID 1234567890 and join the Order table to the Order
Line table based on Order No.

There is a flaw in our database design above. The Invoice relvar


contains an Order No attribute. So, each tuple in the Invoice relvar will
have one Order No, which implies that there is precisely one Order for
each Invoice. But in reality an invoice can be created against many
orders, or indeed for no particular order. Additionally the Order relvar
contains an Invoice No attribute, implying that each Order has a
corresponding Invoice. But again this is not always true in the real
world. An order is sometimes paid through several invoices, and
sometimes paid without an invoice. In other words there can be many
Invoices per Order and many Orders per Invoice. This is a many-to-
many relationship between Order and Invoice (also called a non-
specific relationship). To represent this relationship in the database a
new relvar should be introduced whose role is to specify the
correspondence between Orders and Invoices:

OrderInvoice(Order No,Invoice No)

Now, the Order relvar has a one-to-many relationship to the


OrderInvoice table, as does the Invoice relvar. If we want to retrieve
every Invoice for a particular Order, we can query for all orders where
Order No in the Order relation equals the Order No in OrderInvoice,
and where Invoice No in OrderInvoice equals the Invoice No in
Invoice.

Set-theoretic formulation
Basic notions in the relational model are relation names and attribute
names. We will represent these as strings such as "Person" and
"name" and we will usually use the variables and a,b,c to
range over them. Another basic notion is the set of atomic values that
contains values such as numbers and strings.

Our first definition concerns the notion of tuple, which formalizes the
notion of row or record in a table:

Tuple
A tuple is a partial function from attribute names to atomic
values.
Header
A header is a finite set of attribute names.
Projection
The projection of a tuple t on a finite set of attributes A is
.

The next definition defines relation which formalizes the contents of a


table as it is defined in the relational model.

Relation
A relation is a tuple (H,B) with H, the header, and B, the body, a
set of tuples that all have the domain H.

Such a relation closely corresponds to what is usually called the


extension of a predicate in first-order logic except that here we identify
the places in the predicate with attribute names. Usually in the
relational model a database schema is said to consist of a set of
relation names, the headers that are associated with these names and
the constraints that should hold for every instance of the database
schema.

Relation universe
A relation universe U over a header H is a non-empty set of
relations with header H.
Relation schema
A relation schema (H,C) consists of a header H and a predicate
C(R) that is defined for all relations R with header H. A relation
satisfies a relation schema (H,C) if it has header H and satisfies
C.

Key constraints and functional dependencies

One of the simplest and most important types of relation constraints is


the key constraint. It tells us that in every instance of a certain
relational schema the tuples can be identified by their values for
certain attributes.

Superkey
A superkey is written as a finite set of attribute names.
A superkey K holds in a relation (H,B) if:

• and
• there exist no two distinct tuples such that t1[K]
= t2[K].

A superkey holds in a relation universe U if it holds in all


relations in U.
Theorem: A superkey K holds in a relation universe U over H if
and only if and holds in U.
Candidate key
A superkey K holds as a candidate key for a relation universe U if
it holds as a superkey for U and there is no proper subset of K
that also holds as a superkey for U.
Functional dependency
A functional dependency (FD for short) is written as for
X,Y finite sets of attribute names.
A functional dependency holds in a relation (H,B) if:

• and
• tuples ,
A functional dependency holds in a relation universe U if
it holds in all relations in U.
Trivial functional dependency
A functional dependency is trivial under a header H if it holds in
all relation universes over H.
Theorem: An FD is trivial under a header H if and only if
.
Closure
Armstrong's axioms: The closure of a set of FDs S under a
header H, written as S + , is the smallest superset of S such that:

• (reflexivity)

(transitivity) and

(augmentation)

Theorem: Armstrong's axioms are sound and complete; given a


header H and a set S of FDs that only contain subsets of H,
if and only if holds in all relation universes
over H in which all FDs in S hold.
Completion
The completion of a finite set of attributes X under a finite set of
FDs S, written as X + , is the smallest superset of X such that:

The completion of an attribute set can be used to compute if a


certain dependency is in the closure of a set of FDs.
Theorem: Given a set S of FDs, if and only if
.
Irreducible cover
An irreducible cover of a set S of FDs is a set T of FDs such that:

• S+=T+
• there exists no such that S + = U +
• is a singleton set and
• .

Algorithm to derive candidate keys from functional


dependencies
INPUT: a set S of FDs that contain only subsets of a header H
OUTPUT: the set C of superkeys that hold as candidate keys in
all relation universes over H in which all FDs in S hold
begin
C := ∅; // found candidate keys
Q := { H }; // superkeys that contain candidate keys
while Q <> ∅ do
let K be some element from Q;
Q := Q - { K };
minimal := true;
for each X->Y in S do
K' := (K - Y) ∪ X; // derive new superkey
if K' ⊂ K then
minimal := false;
Q := Q ∪ { K' };
end if
end for
if minimal and there is not a subset of K in C then
remove all supersets of K from C;
C := C ∪ { K };
end if
end while
end

See also
• Domain relational calculus • Relation
• Life cycle of a relational • Relational database
database • Relational database
• List of relational database management system
management systems • The Third Manifesto (TTM)
• Query language • TransRelational model
o Database query language
• Tuple-versioning
o Information retrieval
query language

Query language
Query languages are computer languages used to make queries into
databases and information systems.

Broadly, query languages can be classified according to whether they


are database query languages or information retrieval query
languages. Examples include:
• .QL is a proprietary object-oriented query language for querying
relational databases;
• Common Query Language (CQL) a formal language for
representing queries to information retrieval systems such as as
web indexes or bibliographic catalogues.
• CODASYL;
• CxQL is the Query Language used for writing and customizing
queries on CxAudit by Checkmarx.
• D is a query language for truly relational database management
systems (TRDBMS);
• DMX is a query language for Data Mining models;
• Datalog is a query language for deductive databases;
• ERROL is a query language over the Entity-relationship model
(ERM) which mimics major Natural language constructs (of the
English language and possibly other languages). It is especially
tailored for relational databases;
• Gellish English is a language that can be used for queries in
Gellish English Databases [1], for dialogues (requests and
responses) as well as for information modeling and knowledge
modeling;
• ISBL is a query language for PRTV, one of the earliest relational
database management systems;
• LDAP is an application protocol for querying and modifying
directory services running over TCP/IP.
• MQL is a cheminformatics query language for a substructure
search allowing beside nominal properties also numerical
properties;
• MDX is a query language for OLAP databases;
• OQL is Object Query Language;
• OCL (Object Constraint Language). Despite its name, OCL is also
an object query language and a OMG standard.
• OPath, intended for use in querying WinFS Stores;
• Poliqarp Query Language is a special query language designed to
analyze annotated text. Used in the Poliqarp search engine;
• QUEL is a relational database access language, similar in most
ways to SQL;
• SMARTS is the cheminformatics standard for a substructure
search;
• SPARQL is a query language for RDF graphs;
• SQL is a well known query language for relational databases;
• SuprTool is a proprietary query language for SuprTool, a
database access program used for accessing data in Image/SQL
(TurboIMAGE) and Oracle databases;
• TMQL Topic Map Query Language is a query language for Topic
Maps;
• XQuery is a query language for XML data sources;
• XPath is a language for navigating XML documents;
• XSQL

Hierarchical model
Hierarchical model redirects here. For the statistics usage, see
hierarchical linear modeling.

A hierarchical data model is a data model in which the data is


organized into a tree-like structure. The structure allows repeating
information using parent/child relationships: each parent can have
many children but each child only has one parent. All attributes of a
specific record are listed under an entity type.

Example of a Hierarchical Model.

In a database, an entity type is the equivalent of a table; each


individual record is represented as a row and an attribute as a column.
Entity types are related to each other using 1: N mapping, also known
as one-to-many relationships. The most recognized example of
hierarchical model database is IMS designed by IBM.

History
Prior to the development of the first database management system
(DBMS), access to data was provided by application programs that
accessed flat files. Data integrity problems and the inability of such file
processing systems to reprsent logical data relationships lead to the
first data model: the hierarchical data model. This model, which was
implemented primarily by IBM's Information Management System
(IMS) only allows one-to-one or one-to-many relationships between
entities. Any entity at the many end of the relationship can be related
only to one entity at the one end.[1]

A relational database implementation of this type of data model was


first discussed in publication form in 1992[2] (see also nested set
model).

Example
An example of a hierarchical data model would be if an organization
had records of employees in a table (entity type) called "Employees".
In the table there would be attributes/columns such as First Name,
Last Name, Job Name and Wage. The company also has data about the
employee’s children in a separate table called "Children" with
attributes such as First Name, Last Name, and date of birth. The
Employee table represents a parent segment and the Children table
represents a Child segment. These two segments form a hierarchy
where an employee may have many children, but each child may only
have one parent.

Consider the following structure:

EmpNo Designation ReportsTo


10 Director
20 Senior Manager 10
30 Typist 20
40 Programmer 20

In this, the "child" is the same type as the "parent". The hierarchy
stating EmpNo 10 is boss of 20, and 30 and 40 each report to 20 is
represented by the "ReportsTo" column. In Relational database terms,
the ReportsTo column is a foreign key referencing the EmpNo column.
If the "child" data type were different, it would be in a different table,
but there would still be a foreign key referencing the EmpNo column of
the employees table.

This simple model is commonly known as the adjacency list model,


and was introduced by Dr. Edgar F. Codd after initial criticisms surfaced
that the relational model could not model hierarchical data.
Network model
The network model is a database model conceived as a flexible way
of representing objects and their relationships.

Example of a Network Model.

The network model's original inventor was Charles Bachman, and it


was developed into a standard specification published in 1969 by the
CODASYL Consortium.

Overview
Where the hierarchical model structures data as a tree of records, with
each record having one parent record and many children, the network
model allows each record to have multiple parent and child records,
forming a lattice structure.

The chief argument in favour of the network model, in comparison to


the hierarchic model, was that it allowed a more natural modeling of
relationships between entities. Although the model was widely
implemented and used, it failed to become dominant for two main
reasons. Firstly, IBM chose to stick to the hierarchical model with semi-
network extensions in their established products such as IMS and DL/I.
Secondly, it was eventually displaced by the relational model, which
offered a higher-level, more declarative interface. Until the early 1980s
the performance benefits of the low-level navigational interfaces
offered by hierarchical and network databases were persuasive for
many large-scale applications, but as hardware became faster, the
extra productivity and flexibility of the relational model led to the
gradual obsolescence of the network model in corporate enterprise
usage.

Some Well-known Network Databases

• TurboIMAGE
• IDMS(Integrated Database Management System)
• RDM Embedded
• RDM Server

History
In 1969, the Conference on Data Systems Languages (CODASYL)
established the first specification of the network database model. This
was followed by a second publication in 1971, which became the basis
for most implementations. Subsequent work continued into the early
1980s, culminating in an ISO specification, but this had little influence
on products.

Entity-relationship model
In software engineering, an Entity-Relationship Model (ERM) is an
abstract and conceptual representation of data. Entity-relationship
modeling is a database modeling method, used to produce a type of
conceptual schema or semantic data model of a system, often a
relational database, and its requirements in a top-down fashion.

Diagrams created using this process are called entity-relationship


diagrams, or ER diagrams or ERDs for short.

The definitive reference for entity relationship modelling is generally


given as Peter Chen's 1976 paper[1]. However, variants of the idea
existed previously (see for example A.P.G. Brown[2]) and have been
devised subsequently.

Contents
[show]

Overview
The first stage of information system design uses these models during
the requirements analysis to describe information needs or the type of
information that is to be stored in a database. The data modeling
technique can be used to describe any ontology (i.e. an overview and
classifications of used terms and their relationships) for a certain
universe of discourse (i.e. area of interest). In the case of the design
of an information system that is based on a database, the conceptual
data model is, at a later stage (usually called logical design), mapped
to a logical data model, such as the relational model; this in turn is
mapped to a physical model during physical design. Note that
sometimes, both of these phases are referred to as "physical design".

There are a number of conventions for entity-relationship diagrams


(ERDs). The classical notation is described in the remainder of this
article, and mainly relates to conceptual modeling. There are a range
of notations more typically employed in logical and physical database
design, such as IDEF1X.

Connection

Two related entities

An entity with an attribute

A relationship with an attribute

Primary key

An entity may be defined as a thing which is recognized as being


capable of an independent existence and which can be uniquely
identified. An entity is an abstraction from the complexities of some
domain. When we speak of an entity we normally speak of some
aspect of the real world which can be distinguished from other aspects
of the real world.[3]

An entity may be a physical object such as a house or a car, an event


such as a house sale or a car service, or a concept such as a customer
transaction or order. Although the term entity is the one most
commonly used, following Chen we should really distinguish between
an entity and an entity-type. An entity-type is a category. An entity,
strictly speaking, is an instance of a given entity-type. There are
usually many instances of an entity-type. Because the term entity-type
is somewhat cumbersome, most people tend to use the term entity as
a synonym for this term.

Entities can be thought of as nouns. Examples: a computer, an


employee, a song, a mathematical theorem. Entities are represented
as rectangles.

A relationship captures how two or more entities are related to one


another. Relationships can be thought of as verbs, linking two or more
nouns. Examples: an owns relationship between a company and a
computer, a supervises relationship between an employee and a
department, a performs relationship between an artist and a song, a
proved relationship between a mathematician and a theorem.
Relationships are represented as diamonds, connected by lines to each
of the entities in the relationship.

Entities and relationships can both have attributes. Examples: an


employee entity might have a Social Security Number (SSN) attribute;
the proved relationship may have a date attribute. Attributes are
represented as ellipses connected to their owning entity sets by a line.

Every entity (unless it is a weak entity) must have a minimal set of


uniquely identifying attributes, which is called the entity's primary key.

Entity-relationship diagrams don't show single entities or single


instances of relations. Rather, they show entity sets and relationship
sets. Example: a particular song is an entity. The collection of all songs
in a database is an entity set. The eaten relationship between a child
and her lunch is a single relationship. The set of all such child-lunch
relationships in a database is a relationship set.

Lines are drawn between entity sets and the relationship sets they are
involved in. If all entities in an entity set must participate in the
relationship set, a thick or double line is drawn. This is called a
participation constraint. If each entity of the entity set can participate
in at most one relationship in the relationship set, an arrow is drawn
from the entity set to the relationship set. This is called a key
constraint. To indicate that each entity in the entity set is involved in
exactly one relationship, a thick arrow is drawn.

Alternative diagramming conventions

Two related entities shown using Crow's Foot notation

Chen's notation for entity-relationship modelling uses rectangles to


represent entities, and diamonds to represent relationships. This
notation is appropriate because Chen's relationships are first-class
objects: they can have attributes and relationships of their own.

Alternative conventions, with partly historical meaning are:

• IDEF1X[4]
• The Bachman notation of Charles Bachman
• The Martin notation of James Martin
• The (min, max)-notation of Jean-Raymond Abrial in 1974, and
• The UML standard
• The EXPRESS

Crow's Foot

One alternative notation, known as "crow's foot" notation, was


developed independently: in these diagrams, entities are represented
by boxes, and relationships by labelled arcs.

The "Crow's Foot" notation represents relationships with connecting


lines between entities, and pairs of symbols at the ends of those lines
to represent the cardinality of the relationship. Crow's Foot notation is
used in Barker's Notation and in methodologies such as SSADM and
Information Engineering.

For a while Chen's notation was more popular in the United States,
while Crow's Foot notation was used primarily in the UK, being used in
the 1980s by the then-influential consultancy practice CACI. Many of
the consultants at CACI (including Barker) subsequently moved to
Oracle UK, where they developed the early versions of Oracle's CASE
tools; this had the effect of introducing the notation to a wider
audience, and it is now used in many tools including System Architect,
Visio, PowerDesigner, Toad Data Modeler, DeZign for Databases,
OmniGraffle, MySQL Workbench and Dia. Crow's foot notation has the
following benefits:

• Clarity in identifying the many, or child, side of the relationship,


using the crow's foot.
• Concise notation for identifying mandatory relationship, using a
perpendicular bar, or an optional relation, using an open circle.
• Shows a clear and concise notation that identifies all classes

ER diagramming tools
There are many ER diagramming tools. Some of the proprietary ER
diagramming tools are Avolution, dbForge Studio for MySQL, DeZign
for Databases, ConceptDraw, ER/Studio, ERwin, MEGA International,
OmniGraffle, Oracle Designer, PowerDesigner, Rational Rose,
SmartDraw, Sparx Enterprise Architect, SQLyog, Toad Data Modeler,
Microsoft Visio, and Visual Paradigm. A freeware ER tool that can
generate database and application layer code (webservices) is the
RISE Editor.

Some free software ER diagramming tools that can interpret and


generate ER models, SQL and do database analysis are StarUML,
MySQL Workbench, and SchemaSpy[5]. Some free software diagram
tools which can't create ER diagrams but just draw the shapes without
having any knowledge of what they mean or generating SQL are Kivio,
Dia. Although DIA diagrams can be translated with tedia2sql.

See also
• Associative entity
• Data model
• Data structure diagram
• Enhanced Entity-Relationship Model
• Object Role Modeling
• Three schema approach
• Unified Modeling Language
• Value range structure diagrams

Object-relational database
An object-relational database (ORD) or object-relational database
management system (ORDBMS) is a database management system
(DBMS) similar to a relational database, but with an object-oriented
database model: objects, classes and inheritance are directly
supported in database schemas and in the query language. In
addition, it supports extension of the data model with custom data-
types and methods.
Example of a Object-Oriented Database Model.[1]

An object-relational database can be said to provide a middle ground


between relational databases and object-oriented databases
(OODBMS). In object-relational databases, the approach is essentially
that of relational databases: the data resides in the database and is
manipulated collectively with queries in a query language; at the other
extreme are OODBMSes in which the database is essentially a
persistent object store for software written in an object-oriented
programming language, with a programming API for storing and
retrieving objects, and little or no specific support for querying.

Contents
[show]

[edit] Overview
One aim for the Object-relational database is to bridge the gap
between conceptual data modeling techniques such as Entity-
relationship diagram (ERD) and object-relational mapping (ORM),
which often use classes and inheritance, and relational databases,
which do not directly support them.

Another, related, aim is to bridge the gap between relational databases


and the object-oriented modeling techniques used in programming
languages such as Java, C++ or C#. However, a more popular
alternative for achieving such a bridge is to use a standard relational
database systems with some form of ORM software.
Whereas traditional RDBMS or SQL-DBMS products focused on the
efficient management of data drawn from a limited set of data-types
(defined by the relevant language standards), an object-relational
DBMS allows software-developers to integrate their own types and the
methods that apply to them into the DBMS. ORDBMS technology aims
to allow developers to raise the level of abstraction at which they view
the problem domain.[clarification needed] This goal is not universally shared;
proponents of relational databases often argue that object-oriented
specification lowers the abstraction level.

Many SQL ORDBMSs on the market today are extensible with user-
defined types (UDT) and custom-written functions (e.g. stored
procedures). Some (e.g. Microsoft SQL Server) allow such functions to
be written in object-oriented programming languages, but this by itself
doesn't make them object-oriented databases; in an object-oriented
database, object orientation is a feature of the data model.

History
Object-relational database management systems grew out of research
that occurred in the early 1990s. That research extended existing
relational database concepts by adding object concepts. The
researchers aimed to retain a declarative query-language based on
predicate calculus as a central component of the architecture. Probably
the most notable research project, Postgres (UC Berkeley), spawned
two products tracing their lineage to that research: Illustra and
PostgreSQL.

In the mid-1990s, early commercial products appeared. These


included Illustra[2] (Illustra Information Systems, acquired by Informix
Software which was in turn acquired by IBM), Omniscience
(Omniscience Corporation, acquired by Oracle Corporation and became
the original Oracle Lite), and UniSQL (UniSQL, Inc., acquired by
KCOMS). Ukrainian developer Ruslan Zasukhin, founder of Paradigma
Software, Inc., developed and shipped the first version of Valentina
database in the mid-1990s as a C++ SDK. By the next decade,
PostgreSQL had become a commercially viable database and is the
basis for several products today which maintain its ORDBMS features.

Computer scientists came to refer to these products as "object-


relational database management systems" or ORDBMSs.[3]

Many of the ideas of early object-relational database efforts have


largely become incorporated into SQL:1999. In fact, any product that
adheres to the object-oriented aspects of SQL:1999 could be described
as an object-relational database management product. For example,
IBM's DB2, Oracle database, and Microsoft SQL Server, make claims to
support this technology and do so with varying degrees of success.

Comparison to RDBMS
An RDBMS might commonly involve SQL statements such as these:

CREATE TABLE Customers (


Id CHAR(12) NOT NULL PRIMARY KEY,
Surname VARCHAR(32) NOT NULL,
FirstName VARCHAR(32) NOT NULL,
DOB DATE NOT NULL
);
SELECT InitCap(Surname) || ', ' || InitCap(FirstName)
FROM Customers
WHERE Month(DOB) = Month(getdate())
AND Day(DOB) = Day(getdate())

Most current SQL databases allow the crafting of custom functions,


which would allow the query to appear as:

SELECT Formal(Id)
FROM Customers
WHERE Birthday(Id) = Today()

In an object-relational database, one might see something like this,


with user-defined data-types and expressions such as BirthDay():

CREATE TABLE Customers (


Id Cust_Id NOT NULL PRIMARY KEY,
Name PersonName NOT NULL,
DOB DATE NOT NULL
);
SELECT Formal( C.Id )
FROM Customers C
WHERE BirthDay ( C.DOB ) = TODAY;

The object-relational model can offer another advantage in that the


database can make use of the relationships between data to easily
collect related records. In an address book application, an additional
table would be added to the ones above to hold zero or more
addresses for each user. Using a traditional RDBMS, collecting
information for both the user and their address requires a "join":

SELECT InitCap(C.Surname) || ', ' || InitCap(C.FirstName), A.city


FROM Customers C JOIN Addresses A ON A.Cust_Id=C.Id -- the join
WHERE A.city="New York"

The same query in an object-relational database appears more simply:

SELECT Formal( C.Name )


FROM Customers C
WHERE C.address.city="New York" -- the linkage is 'understood' by the ORDB

Object model
From Wikipedia, the free encyclopedia

Jump to: navigation, search

In computing, object model has two related but distinct meanings:

1. The properties of objects in general, in a specific computer


programming language, technology, notation or methodology
that uses them. For example, the Java object model, the COM
object model, or the object model of OMT. Such object models
are usually defined using concepts such as class, message,
inheritance, polymorphism, and encapsulation. There is an
extensive literature on formalized object models as a subset of
the formal semantics of programming languages.
2. A collection of objects or classes through which a program can
examine and manipulate some specific parts of its world. In
other words, the object-oriented interface to some service or
system. Such an interface is said to be the object model of the
represented service or system. For example, the Document
Object Model (DOM) [1] is a collection of objects that represent
a page in a web browser, used by script programs to examine
and dynamically change the page. There is a Microsoft Excel
object model [2] for controlling Microsoft Excel from another
program, and the ASCOM Telescope Driver [3] is an object model
for controlling an astronomical telescope.

See also
• Object-Oriented Programming
• Object-oriented analysis and design
• Object Management Group
• Domain-driven design
Conceptual schema
From Wikipedia, the free encyclopedia

Jump to: navigation, search

A conceptual schema or conceptual data model is a map of concepts


and their relationships. This describes the semantics of an organization
and represents a series of assertions about its nature. Specifically, it
describes the things of significance to an organization (entity classes),
about which it is inclined to collect information, and characteristics of
(attributes) and associations between pairs of those things of
significance (relationships).

Contents
[show]

Overview
Because a conceptual schema represents the semantics of an
organization, and not a database design, it may exist on various levels
of abstraction. The original ANSI four-schema architecture began with
the set of external schemas that each represent one person's view of
the world around him or her. These are consolidated into a single
conceptual schema that is the superset of all of those external views. A
data model can be as concrete as each person's perspective, but this
tends to make it inflexible. If that person's world changes, the model
must change. Conceptual data models take a more abstract
perspective, identifying the fundamental things, of which the things an
individual deals with are just examples.

The model does allow for what is called inheritance in object oriented
terms. The set of instances of an entity class may be subdivided into
entity classes in their own right. Thus, each instance of a sub-type
entity class is also an instance of the entity class's super-type. Each
instance of the super-type entity class, then is also an instance of one
of the sub-type entity classes.
Super-type/sub-type relationships may be exclusive or not. A
methodology may require that each instance of a super-type may only
be an instance of one sub-type. Similarly, a super-type/sub-type
relationship may be exhaustive or not. It is exhaustive if the
methodology requires that each instance of a super-type must be an
instance of a sub-type.

Example relationships
• Each PERSON may be the vendor in one or more ORDERS.
• Each ORDER must be from one and only one PERSON.
• PERSON is a sub-type of PARTY. (Meaning that every instance of
PERSON is also an instance of PARTY.)
• Each Employee may have the supervisor within Employee.

Data structure diagram

Data Structure Diagram.

A data structure diagram (DSD) is a data model or diagram used to


describe conceptual data models by providing graphical notations
which document entities and their relationships, and the constraints
that binds them.

See also
• Concept mapping
• Data modeling
• Entity-relationship model
• Object-relationship modelling
• Object role modeling
• Knowledge representation
• Logical data model
• Mindmap
• Ontology
• Physical data model
• Semantic Web
• Three schema approach

Semantic data model


From Wikipedia, the free encyclopedia

Jump to: navigation, search

Semantic data models.[1]

A semantic data model in software engineering is a data modeling


technique to define the meaning of data within the context of its
interrelationships with other data. A semantic data model is an
abstraction which defines how the stored symbols relate to the real
world.[1] A semantic data model is sometimes called a conceptual data
model.

Contents
[show]

Overview
The logical data structure of a database management system (DBMS),
whether hierarchical, network, or relational, cannot totally satisfy the
requirements for a conceptual definition of data because it is limited in
scope and biased toward the implementation strategy employed by the
DBMS. Therefore, the need to define data from a conceptual view has
led to the development of semantic data modeling techniques. That is,
techniques to define the meaning of data within the context of its
interrelationships with other data. As illustrated in the figure. The real
world, in terms of resources, ideas, events, etc., are symbolically
defined within physical data stores. A semantic data model is an
abstraction which defines how the stored symbols relate to the real
world. Thus, the model must be a true representation of the real
world.[1]

The overall goal of semantic data models is to capture more meaning


of data by integrating relational concepts with more powerful
abstraction concepts known from the Artificial Intelligence field. The
idea is to provide high level modeling primitives as integral part of a
data model in order to facilitate the representation of real world
situations.[2]

History
The need for semantic data models was first recognized by the U.S. Air
Force in the mid-1970s as a result of the Integrated Computer-Aided
Manufacturing (ICAM) Program. The objective of this program was to
increase manufacturing productivity through the systematic application
of computer technology. The ICAM Program identified a need for better
analysis and communication techniques for people involved in
improving manufacturing productivity. As a result, the ICAM Program
developed a series of techniques known as the IDEF (ICAM Definition)
Methods which included the following:[1]

• IDEF0 used to produce a “function model” which is a structured


representation of the activities or processes within the
environment or system.
• IDEF1 used to produce an “information model” which represents
the structure and semantics of information within the
environment or system.
• IDEF2 used to produce a “dynamics model” which represents the
time varying behavioral characteristics of the environment or
system.

Applications
A semantic data model can be used to serve many purposes. Some
key objectives include:[1]

• Planning of Data Resources: A preliminary data model can be


used to provide an overall view of the data required to run an
enterprise. The model can then be analyzed to identify and
scope projects to build shared data resources.
• Building of Shareable Databases: A fully developed model can be
used to define an application independent view of data which can
be validated by users and then transformed into a physical
database design for any of the various DBMS technologies. In
addition to generating databases which are consistent and
shareable, development costs can be drastically reduced through
data modeling.
• Evaluation of Vendor Software: Since a data model actually
represents the infrastructure of an organization, vendor software
can be evaluated against a company’s data model in order to
identify possible inconsistencies between the infrastructure
implied by the software and the way the company actually does
business.
• Integration of Existing Databases: By defining the contents of
existing databases with semantic data models, an integrated
data definition can be derived. With the proper technology, the
resulting conceptual schema can be used to control transaction
processing in a distributed database environment. The U.S. Air
Force Integrated Information Support System (I2S2) is an
experimental development and demonstration of this type of
technology applied to a heterogeneous DBMS environment.

IDEF1X is the semantic data modeling technique. It is used to produce


a graphical information model which represents the structure and
semantics of information within an environment or system. Use of this
standard permits the construction of semantic data models which may
serve to support the management of data as a resource, the
integration of information systems, and the building of computer
databases.

See also
• Conceptual schema
• Entity-relationship model
• Information model
• Relational Model/Tasmania
• Three schema approach
• QuakeSim

Top-down and bottom-up


design
From Wikipedia, the free encyclopedia

(Redirected from Top-down)


Jump to: navigation, search
"Top-down" redirects here. For other uses, see Top-down
(disambiguation).

Top-down and bottom-up are strategies of information processing


and knowledge ordering, mostly involving software, but also other
humanistic and scientific theories (see systemics). In practice, they
can be seen as a style of thinking and teaching. In many cases top-
down is used as a synonym of analysis or decomposition, and bottom-
up of synthesis.

A top-down approach is essentially breaking down a system to gain


insight into its compositional sub-systems. In a top-down approach an
overview of the system is first formulated, specifying but not detailing
any first-level subsystems. Each subsystem is then refined in yet
greater detail, sometimes in many additional subsystem levels, until
the entire specification is reduced to base elements. A top-down model
is often specified with the assistance of "black boxes" that make it
easier to manipulate. However, black boxes may fail to elucidate
elementary mechanisms or be detailed enough to realistically validate
the model.

A bottom-up approach is piecing together systems to give rise to


grander systems, thus making the original systems sub-systems of the
emergent system. In a bottom-up approach the individual base
elements of the system are first specified in great detail. These
elements are then linked together to form larger subsystems, which
then in turn are linked, sometimes in many levels, until a complete
top-level system is formed. This strategy often resembles a "seed"
model, whereby the beginnings are small but eventually grow in
complexity and completeness. However, "organic strategies" may
result in a tangle of elements and subsystems, developed in isolation
and subject to local optimization as opposed to meeting a global
purpose.

Contents

Computer science
Software development
Part of this section is from the Perl Design Patterns Book.

In the software development process, the top-down and bottom-up


approaches play a key role.

Top-down approaches emphasize planning and a complete


understanding of the system. It is inherent that no coding can begin
until a sufficient level of detail has been reached in the design of at
least some part of the system. The Top-Down Approach is done by
attaching the stubs in place of the module. This, however, delays
testing of the ultimate functional units of a system until significant
design is complete. Bottom-up emphasizes coding and early testing,
which can begin as soon as the first module has been specified. This
approach, however, runs the risk that modules may be coded without
having a clear idea of how they link to other parts of the system, and
that such linking may not be as easy as first thought. Re-usability of
code is one of the main benefits of the bottom-up approach.[citation needed]

Top-down design was promoted in the 1970s by IBM researcher Harlan


Mills and Niklaus Wirth. Mills developed structured programming
concepts for practical use and tested them in a 1969 project to
automate the New York Times morgue index. The engineering and
management success of this project led to the spread of the top-down
approach through IBM and the rest of the computer industry. Among
other achievements, Niklaus Wirth, the developer of Pascal
programming language, wrote the influential paper Program
Development by Stepwise Refinement. Since Niklaus Wirth went on to
develop languages such as Modula and Oberon (where one could
define a module before knowing about the entire program
specification), one can infer that top down programming was not
strictly what he promoted. Top-down methods were favored in
software engineering until the late 1980s, and object-oriented
programming assisted in demonstrating the idea that both aspects of
top-down and bottom-up programming could be utilized.

Modern software design approaches usually combine both top-down


and bottom-up approaches. Although an understanding of the
complete system is usually considered necessary for good design,
leading theoretically to a top-down approach, most software projects
attempt to make use of existing code to some degree. Pre-existing
modules give designs a bottom-up flavour. Some design approaches
also use an approach where a partially-functional system is designed
and coded to completion, and this system is then expanded to fulfill all
the requirements for the project.
Programming

Top-down programming is a programming style, the mainstay of


traditional procedural languages, in which design begins by specifying
complex pieces and then dividing them into successively smaller
pieces. Eventually, the components are specific enough to be coded
and the program is written. This is the exact opposite of the bottom-up
programming approach which is common in object-oriented languages
such as C++ or Java.

The technique for writing a program using top-down methods is to


write a main procedure that names all the major functions it will need.
Later, the programming team looks at the requirements of each of
those functions and the process is repeated. These compartmentalized
sub-routines eventually will perform actions so simple they can be
easily and concisely coded. When all the various sub-routines have
been coded the program is done.

By defining how the application comes together at a high level, lower


level work can be self-contained. By defining how the lower level
objects are expected to integrate into a higher level object, interfaces
become clearly defined.

Advantages of top-down programming

• Separating the low level work from the higher level objects leads
to a modular design.
• Modular design means development can be self contained.
• Having "skeleton" code illustrates clearly how low level modules
integrate.
• Fewer operations errors (to reduce errors, because each module
has to be processed separately, so programmers get large
amount of time for processing).
• Much less time consuming (each programmer is only involved in
a part of the big project).
• Very optimized way of processing (each programmer has to
apply their own knowledge and experience to their parts
(modules), so the project will become an optimized one).
• Easy to maintain (if an error occurs in the output, it is easy to
identify the errors generated from which module of the entire
program).
Disadvantages of top-down programming

• Functionality either needs to be inserted into low level objects by


making them return "canned answers"—manually constructed
objects, similar to what you would specify if you were mocking
them in a test, or otherwise functionality will be lacking until
development of low level objects is complete.

Bottom-up approach

Pro/ENGINEER WF4.0 proetools.com - Lego Racer Pro/ENGINEER Parts


is a good example of bottom-up design because the parts are first
created and then assembled without regard to how the parts will work
in the assembly.

In a bottom-up approach the individual base elements of the system


are first specified in great detail. These elements are then linked
together to form larger subsystems, which then in turn are linked,
sometimes in many levels, until a complete top-level system is formed.
This strategy often resembles a "seed" model, whereby the beginnings
are small, but eventually grow in complexity and completeness.

Object-oriented programming (OOP) is a programming paradigm that


uses "objects" to design applications and computer programs.

In Mechanical Engineering with software programs such as


Pro/ENGINEER and Solidworks users can design products as pieces not
part of the wole and later add those pieces together to form
assemblies like building LEGOS. Engineers call this piece part design.

This bottom-up approach has one weakness. We need to use a lot of


intuition to decide the functionality that is to be provided by the
module. If a system is to be built from existing system, this approach
is more suitable as it starts from some existing modules.

Parsing

Parsing is the process of analyzing an input sequence (such as that


read from a file or a keyboard) in order to determine its grammatical
structure. This method is used in the analysis of both natural
languages and computer languages, as in a compiler.

Bottom-up parsing is a strategy for analyzing unknown data


relationships that attempts to identify the most fundamental units first,
and then to infer higher-order structures from them. Top-down
parsers, on the other hand, hypothesize general parse tree structures
and then consider whether the known fundamental structures are
compatible with the hypothesis. See Top-down parsing and Bottom-up
parsing.

Nanotechnology
Main article: Nanotechnology

Top-down and bottom-up are two approaches for the manufacture


of products. These terms were first applied to the field of
nanotechnology by the Foresight Institute in 1989 in order to
distinguish between molecular manufacturing (to mass-produce large
atomically precise objects) and conventional manufacturing (which can
mass-produce large objects that are not atomically precise). Bottom-
up approaches seek to have smaller (usually molecular) components
built up into more complex assemblies, while top-down approaches
seek to create nanoscale devices by using larger, externally-controlled
ones to direct their assembly.

The top-down approach often uses the traditional workshop or


microfabrication methods where externally-controlled tools are used to
cut, mill, and shape materials into the desired shape and order.
Micropatterning techniques, such as photolithography and inkjet
printing belong to this category. Bottom-up approaches, in contrast,
use the chemical properties of single molecules to cause single-
molecule components to (a) self-organize or self-assemble into some
useful conformation, or (b) rely on positional assembly. These
approaches utilize the concepts of molecular self-assembly and/or
molecular recognition. See also Supramolecular chemistry.
Such bottom-up approaches should, broadly speaking, be able to
produce devices in parallel and much cheaper than top-down methods,
but could potentially be overwhelmed as the size and complexity of the
desired assembly increases.

Neuroscience and psychology

An example of top down processing: Even though the second letter in


each word is ambiguous, top down processing allows for easy
disambiguation based on the context.

These terms are also employed in neuroscience and psychology. The


study of visual attention provides an example. If your attention is
drawn to a flower in a field, it may be simply that the flower is more
visually salient than the surrounding field. The information that caused
you to attend to the flower came to you in a bottom-up fashion —
your attention was not contingent upon knowledge of the flower; the
outside stimulus was sufficient on its own.

Contrast this situation with one in which you are looking for a flower.
You have a representation of what you are looking for. When you see
the object you are looking for, it is salient. This is an example of the
use of top-down information.

In cognitive terms, two thinking approaches are distinguished. "Top


down" (or "big chunk") is stereotypically the visionary, or the person
who sees the larger picture and overview. Such people focus on the
big picture and from that derive the details to support it. "Bottom up"
(or "small chunk") cognition is akin to focusing on the detail primarily,
rather than the landscape. The expression "seeing the wood for the
trees" references the two styles of cognition.

Management and organization


In management and organizational arenas, the terms "top down" and
"bottom up" are used to indicate how decisions are made.

A "top down" approach is one where an executive, decision maker, or


other person or body makes a decision. This approach is disseminated
under their authority to lower levels in the hierarchy, who are, to a
greater or lesser extent, bound by them. For example, a structure in
which decisions either are approved by a manager, or approved by his
authorised representatives based on the manager's prior guidelines, is
top-down management.

A "bottom up" approach is one that works from the grassroots — from
a large number of people working together, causing a decision to arise
from their joint involvement. A decision by a number of activists,
students, or victims of some incident to take action is a "bottom-up"
decision.

Positive aspects of top-down approaches include their efficiency and


superb overview of higher levels. Also, external effects can be
internalized. On the negative side, if reforms are perceived to be
imposed ‘from above’, it can be difficult for lower levels to accept them
(e.g. Bresser Pereira, Maravall, and Przeworski 1993). Evidence
suggests this to be true regardless of the content of reforms (e.g.
Dubois 2002). A bottom-up approach allows for more experimentation
and a better feeling for what is needed at the bottom.

State organization

Both approaches can be found in the organization of states, this


involving political decisions.

In bottom-up organized organizations, e.g. ministries and their


subordinate entities, decisions are prepared by experts in their fields,
which define, out of their expertise, the policy they deem necessary. If
they cannot agree, even on a compromise, they escalate the problem
to the next higher hierarchy level, where a decision would be sought.
Finally, the highest common principal might have to take the decision.
Information is in the debt of the inferior to the superior, which means
that the inferior owes information to the superior. In the effect, as
soon as inferiors agree, the head of the organization only provides his
“face″ for the decision which his inferiors have agreed upon.

Among several countries, the German political system provides one of


the purest forms of a bottom-up approach. The German Federal Act on
the Public Service provides that any inferior has to consult and support
any superiors, that he or she – only – has to follow “general
guidelines" of the superiors, and that he or she would have to be fully
responsible for any own act in office, and would have to follow a
specific, formal complaint procedure if in doubt of the legality of an
order [1]. Frequently, German politicians had to leave office on the
allegation that they took wrong decisions because of their resistance to
inferior experts' opinions (this commonly being called to be
“beratungsresistent", or resistant to consultation, in German). The
historical foundation of this approach lies with the fact that, in the 19th
century, many politicians used to be noblemen without appropriate
education, who more and more became forced to rely on consultation
of educated experts, which (in particular after the Prussian reforms of
Stein and Hardenberg) enjoyed the status of financially and personally
independent, indismissable, and neutral experts as Beamte (public
servants under public law).

A similar approach can be found in British police laws, where


entitlements of police constables are vested in the constable in person
and not in the police as an administrative agency, this leading to the
single constable being fully responsible for his or her own acts in
office, in particular their legality. The experience of two dictatorships in
the country and, after the end of such regimes, emerging calls for the
legal responsibility of the “aidees of the aidees" (Helfershelfer) of such
regimes also furnished calls for the principle of personal responsibility
of any expert for any decision made, this leading to a strengthening of
the bottom-up approach, which requires maximum responsibility of the
superiors.

In the opposite, the French administration is based on a top-down


approach, where regular public servants enjoy no other task than
simply to execute decisions made by their superiors. As those
superiors also require consultation, this consultation is provided by
members of a cabinet, which is distinctive from the regular minstry
staff in terms of staff and organization. Those members who are not
members of the cabinet are not entitled to make any suggestions or to
take any decisions of political dimension.

The advantage of the bottom-up approach is the great level of


expertise provided, combined with the motivating experience of any
member of the administration to be responsible and finally the
independent “engine" of progress in that field of personal
responsibility. A disadvantage is the lack of democratic control and
transparency, this leading, from a democratic viewpoint, to the
deferment of actual power of policy-making to faceless, if even
unknown, public servants. Even the fact that certain politicians might
“provide their face" to the actual decisions of their inferiors might not
mitigate this effect, but rather strong parliamentary rights of control
and influence in legislative procedures (as they do exist in the example
of Germany).
The advantage of the top-bottom principle is that political and
administrative responsibilities are clearly distinguished from each
other, and that responsibility for political failures can be clearly
identified with the relevant office holder. Disadvantages are that the
system triggers demotivation of inferiors, who know that their ideas to
innovative approaches might not be welcome just because of their
position, and that the decision-makers cannot make use of the full
range of expertise which their inferiors will have collected.

Administrations in dictatorships traditionally work according to a strict


top-down approach. As civil servants below the level of the political
leadership are discouraged from making suggestions, they use to
suffer from the lack of expertise which could be provided by the
inferiors, which regularly leads to a breakdown of the system after an
few decades. Modern communist states, which the People's Republic of
China forms an example of, therefore prefer to define a framework of
permissible, or even encouraged, criticism and self-determination by
inferiors, which would not affect the major state doctrine, but allows
the use of professional and expertise-driven knowledge and the use of
it for the decision-making persons in office.

Architectural
Often, the École des Beaux-Arts school of design is said to have
primarily promoted top-down design because it taught that an
architectural design should begin with a parti, a basic plan drawing of
the overall project.

By contrast, the Bauhaus focused on bottom-up design. This method


manifested itself in the study of translating small-scale organizational
systems to a larger, more architectural scale (as with the woodpanel
carving and furniture design).

Ecological
In ecology, top down control refers to when a top predator controls the
structure/population dynamics of the ecosystem. The classic example
is of kelp forest ecosystems. In such ecosystems, sea otters are a
keystone predator. They prey on urchins which in turn eat kelp. When
otters are removed, urchin populations grow and reduce the kelp
forest creating urchin barrens. In other words, such ecosystems are
not controlled by productivity of the kelp but rather a top predator.
Bottom up control in ecosystems refers to ecosystems in which the
nutrient supply and productivity and type of primary producers (plants
and phytoplankton) control the ecosystem structure. An example
would be how plankton populations are controlled by the availability of
nutrients. Plankton populations tend to be higher and more complex in
areas where upwelling brings nutrients to the surface.

There are many different examples of these concepts. It is not


uncommon for populations to be influenced by both types of control.

Entity
From Wikipedia, the free encyclopedia

Jump to: navigation, search


This article is about the concept of an entity. For other uses, see Entity
(disambiguation).
It has been suggested that this article or section be merged with
Object (philosophy). (Discuss)

An entity is something that has a distinct, separate existence, though


it need not be a material existence. In particular, abstractions and
legal fictions are usually regarded as entities. In general, there is also
no presumption that an entity is animate. Entities are used in system
developmental models that display communications and internal
processing of, say, documents compared to order processing.

An entity could be viewed as a set containing subsets. In philosophy,


such sets are said to be abstract objects.

Sometimes, the word entity is used in a general sense of a being,


whether or not the referent has material existence; e.g., is often
referred to as an entity with no corporeal form, such as a language. It
is also often used to refer to ghosts and other spirits. Taken further,
entity sometimes refers to existence or being itself. For example, the
former U.S. diplomat George F. Kennan once said that "the policy of
the government of the United States is to seek . . . to preserve
Chinese territorial and administrative entity."

The word entitative is the adjective form of the noun entity. Something
that is entitative is "considered as pure entity; abstracted from all
circumstances", that is, regarded as entity alone, apart from attendant
circumstances.
Specialized uses
• A DBMS entity is either a thing in the modeled world or a
drawing element in an ERD.
• A SUMO, Entity is the root node and stands for the universal
class of individuals.
• In VHDL, entity is the keyword for defining a new object.
• An SGML entity is an abbreviation for some expanded piece of
SGML text.
• An open systems architecture entity is an active routine within a
layer.
• In computer games and game engines, entity is a dynamic
object such as a non-player character or item.
• In HTML, entity is a code snippet (ie: "&reg;" for "Registered
Trademark") which is interpreted by web browsers to display
special characters. See List of XML and HTML character entity
references.
• In law, a legal entity is an entity that is capable of bearing legal
rights and obligations, such as a natural person or an artificial
person (e.g. business entity or a corporate entity).

You might also like