RDBMS Concepts | Data Model | Conceptual Model

Abstraction (computer science

)
In computer science, abstraction is a mechanism and practice to reduce and factor out details so that one can focus on a few concepts at a time. The concept is by analogy with abstraction in mathematics. The mathematical technique of abstraction begins with mathematical definitions; this has the fortunate effect of finessing some of the vexing philosophical issues of abstraction. For example, in both computing and in mathematics, numbers are concepts in the programming languages, as founded in mathematics. Implementation details depend on the hardware and software, but this is not a restriction because the computing concept of number is still based on the mathematical concept. Roughly speaking, abstraction can be either that of control or data. Control abstraction is the abstraction of actions while data abstraction is that of data structures. For example, control abstraction in structured programming is the use of subprograms and formatted control flows. Data abstraction is to allow for handling data bits in meaningful manners. For example, it is the basic motivation behind datatype. Object-oriented programming can be seen as an attempt to abstract both data and code. Contents

 

1 Rationale 2 Language features

       

2.1 Programming languages 2.2 Specification languages 3.1 Structured programming

3 Control abstraction 4 Data abstraction 5 Abstraction in object oriented programming

5.1 Object-oriented design

6 Considerations 7 Levels of abstraction

   

7.1 Database systems 7.2 Layered architecture

8 See also 9 Further reading

Rationale
Computing is mostly independent of the concrete world: The hardware implements a model of computation that is interchangeable with others. The software is structured in architectures to enable humans to create the enormous systems by concentration on a few issues at a time. These architectures are made of specific choices of abstractions. Greenspun's Tenth Rule is an aphorism on how such an architecture is both inevitable and complex.

A central form of abstraction in computing is the language abstraction: new artificial languages are developed to express specific aspects of a system. Modelling languages help in planning. Computer languages can be processed with a computer. An example of this abstraction process is the generational development of programming languages from the machine language to the assembly language and the high-level language. Each stage can be used as a stepping stone for the next stage. The language abstraction continues for example in scripting languages and domain-specific programming languages. Within a programming language, some features let the programmer create new abstractions. These include the subroutine, the module, and the software component. Some other abstractions such as software design patterns and architectural styles are not visible to a programming language but only in the design of a system. Some abstractions try to limit the breadth of concepts a programmer needs by completely hiding the abstractions they in turn are built on. Joel Spolsky has criticised these efforts by claiming that all abstractions are leaky — that they are never able to completely hide the details below. Some abstractions are designed to interoperate with others, for example a programming language may contain a foreign function interface for making calls to the lower-level language.

Language features
Programming languages
Different programming languages provide different types of abstraction, depending on the applications for which the language is intended. For example:  In object-oriented programming languages such as C++ or Java, the concept of abstraction is itself a declarative statement, using the keywords virtual or abstract, respectively. After such a declaration, it is the responsibility of the programmer to implement a class to instantiate the object of the declaration. In functional programming languages, it is common to find abstractions related to functions, such as lambda abstractions (making a term into a function of some variable), higher-order functions (parameters are functions), bracket abstraction (making a term into a function of a variable). The Linda abstracts the concepts of server and shared data-space to facilitate distributed programming.

Specification languages
Specification languages generally rely on abstractions of one kind or another, since specifications are typically defined earlier in a project, and at a more abstract level, than an eventual implementation. The UML specification language, for example, allows the definition of abstract classes, which are simply left abstract during the architecture and specification phase of the project.

Control abstraction

Control abstraction is one of the main purposes of using programming languages. Computer machines understand operations at the very low level such as moving some bits from one location of the memory to another location and producing the sum of two sequences of bits. Programming languages allow this to be done in the higher level. For example, consider the highlevel expression/program statement: a := (1 + 2) * 5 To a human, this is a fairly simple and obvious calculation ("one plus two is three, times five is fifteen"). However, the low-level steps necessary to carry out this evaluation, and return the value "15", and then assign that value to the variable "a", are actually quite subtle and complex. The values need to be converted to binary representation (often a much more complicated task than one would think) and the calculations decomposed (by the compiler or interpreter) into assembly instructions (again, which are much less intuitive to the programmer: operations such as shifting a binary register left, or adding the binary complement of the contents of one register to another, are simply not how humans think about the abstract arithmetical operations of addition or multiplication). Finally, assigning the resulting value of "15" to the variable labeled "a", so that "a" can be used later, involves additional 'behind-the-scenes' steps of looking up a variable's label and the resultant location in physical or virtual memory, storing the binary representation of "15" to that memory location, etc. etc. Without control abstraction, a programmer would need to specify all the register/binary-level steps each time she simply wanted to add or multiply a couple of numbers and assign the result to a variable. This duplication of effort has two serious negative consequences: (a) it forces the programmer to constantly repeat fairly common tasks every time a similar operation is needed; and (b) it forces the programmer to program for the particular hardware and instruction set.

Structured programming
Structured programming involves the splitting of complex program tasks into smaller pieces with clear flow control and interfaces between components, with reduction of the complexity potential for side-effects. In a simple program, this may be trying to ensure that loops have single or obvious exit points and trying, where it's most clear to do so, to have single exit points from functions and procedures. In a larger system, it may involve breaking down complex tasks into many different modules. Consider a system handling payroll on ships and at shore offices:    The uppermost level may be a menu of typical end user operations. Within that could be standalone executables or libraries for tasks such as signing on and off employees or printing checks. Within each of those standalone components there could be many different source files, each containing the program code to handle a part of the problem, with only selected interfaces available to other parts of the program. A sign on program could have source

leading directly to the fragile base class problem. Data abstraction Data abstraction is the enforcement of a clear separation between the abstract properties of a data type and the concrete details of its implementation. thus. Object-oriented languages are commonly claimed to offer data abstraction. Such a lookup table may be implemented in various ways: as a hash table. one could define an abstract data type called lookup table. This concept was embraced and extended in objectoriented programming. it is called polymorphism. changes to such information ends up impacting client code. Languages that implement data abstraction include Ada and Modula-2. When it proceeds in the . since they involve no difference in the abstract behaviour. Abstraction in object oriented programming In object-oriented programming theory. files for each data entry screen and the database interface (which may itself be a standalone third party library or a statically linked set of library routines). and "communicate" with other objects in the system. Either the database or the payroll application also has to initiate the process of exchanging data with between ship and shore and that data transfer task will often contain many other components. enabling objects of different types to be substituted. abstraction is the facility to define objects that represent abstract "actors" that can perform work. since any changes there can have major impacts on client code. The abstract properties are those that are visible to client code that makes use of the data type--the interface to the data type--while the concrete implementation is kept entirely private. or even a simple linear list. a binary search tree. and values may be retrieved by specifying their corresponding keys. for example to incorporate efficiency improvements over time. For example. anything not spelled out in the contract is subject to change without notice. and indeed can change. The idea is that such changes are not supposed to have any impact on client code. where keys are uniquely associated with values. As far as client code is concerned. report on and change their state. however. the abstract properties of the type are the same in each case. The term encapsulation refers to the hiding of state details. this all relies on getting the details of the interface right in the first place. These layers produce the effect of isolating the implementation details of one component and its assorted internal methods from the others. When abstraction proceeds into the operations defined. is the beginning of abstraction. their inheritance concept tends to put information in the interface that more properly belongs in the implementation. Of course. but extending the concept of data type from earlier programming languages to associate behavior most strongly with the data. and standardizing the way that different data types interact. Another way to look at this is that the interface forms a contract on agreed behaviour between the data type and client code.

and individual objects and functions are abstracted more flexibly to better fit with a shared functional heritage from Lisp. . Although it is not as generally supported. } } void eat(Food f) { // Consume food energyReserves += f. which relies heavily on templates and overloading and other static bindings at compile-time. all to support a general strategy of polymorphism in object-oriented programming. here is a sample Java fragment to represent some common farm "animals" to a level of abstraction suitable to model simple aspects of their hunger and feeding. or loadtime. they do not fundamentally alter the need to support abstract nouns in code . it is called delegation or inheritance. } } With the above definition. one could create objects of type Animal and call their methods like this: thePig = new Animal(). This would leave only a minimum of such bindings to change at run-time. more use of delegation for polymorphism.isHungry()) { theCow.isHungry()) { thePig.all programming relies on an ability to abstract verbs as functions. Another extreme is C++. } void moveTo(Location l) { // Move to new location loc = l. In CLOS or self.getCalories(). Various object-oriented programming languages offer similar facilities for abstraction. if (thePig. } else { return false. for example. structuring them to simplify a complex set of relationships.5) { return true. } theCow. which includes the substitution of one type for another in the same or similar role. double energyReserves. } if (theCow. inside the types or classes.eat(grass).eat(tableScraps). a configuration or image or package may predetermine a great many of these bindings at compile-time.moveTo(theBarn). which in turn has certain flexibility problems. Although these are alternate strategies for achieving the same abstraction. For example. and either as processes. theCow = new Animal(). It defines an Animal class to represent both the state of the animal and its functions: class Animal extends LivingThing { Location loc. nouns as data structures.opposite direction. link-time. there is less of a class-instance distinction. boolean isHungry() { if (energyReserves < 2.

but in general each can achieve anything that is possible with any of the others. The class notation is simply a coder's convenience. and we refer to abstraction in object-oriented programming as distinct from abstraction in domain or legacy analysis. data type by data type. In our simple example. In general. that is an intermediary level of abstraction. legacy analysis. and Animal (pigs. so s/he could concentrate instead on the feeding schedule. A decision to differentiate DairyAnimal would change the detailed analysis but the domain and legacy analysis would be unchanged—thus it is entirely under the control of the programmer. then perform a detailed object-oriented analysis which is expressed within project time and budget constraints as an object-oriented design. to determine appropriate abstraction. the live pigs and cows and their eating habits are the legacy constraints. steers) who would eat foods to give the best meat quality. domain analysis. If a more differentiated hierarchy of animals is required to differentiate. A great many operation overloads. the class Animal is an abstraction used in place of an actual animal. and varying degrees of polymorphism between the two types could be defined.In the above example. goats) who would eat foods suitable to giving good milk. Such an abstraction could remove the need for the application coder to specify the type of food. say. can have the same effect at compile-time as any degree of inheritance or other means to achieve polymorphism. but safe. For instance. Abstraction may be exact or faithful with respect to a property if it is possible to answer a question about the property equally well on the concrete or abstract model. The two classes could be related using inheritance or stand alone. if we . For instance. the detailed analysis is that coders must have the flexibility to feed the animals what is available and thus there is no reason to code the type of food into the class itself. definition of the observed program behaviors. These facilities tend to vary drastically between languages. and the design is a single simple Animal class of which pigs and cows are instances with the same functions. Abstraction is defined to a concrete (more precise) model of execution. one may observe only the final result of program executions instead of considering all the intermediate steps of executions. formal methods or abstract interpretation. Object-oriented design Decisions regarding what to abstract and what to keep under the control of the coder are the major concern of object-oriented design and domain analysis—actually determining the relevant relationships in the real world is the concern of object-oriented analysis or legacy analysis. those who provide milk from those who provide nothing except meat at the end of their lives. probably DairyAnimal (cows. the domain is the barnyard. abstraction refers to the act of considering a less accurate. LivingThing is a further abstraction (in this case a generalisation) of Animal. determine what other systems one must cooperate with. one must make many small decisions about scope. Considerations When discussing formal semantics of programming languages.

applications and operating systems on programming languages. database developers often hide complexity through the following levels: Data abstraction levels of a database system . ×. or precision (they may answer "I don't know" to some questions). however. is worth modulo n. are not necessarily exact. but uses a system of expression involving a unique set of objects and compositions that are applicable only to a particular domain. Database systems Since many users of database systems are not deeply familiar with computer data structures. Abstraction is the core concept of abstract interpretation. if it does not. "lower" level. As a consequence. which tends to provide an increasingly "granular" representation. we may abstract the students in a class by their minimal and maximal ages. one may safely answer that the person does not belong to the class. by the level beneath it. one may only answer "I don't know". but not determined. soundness (they may provide false information). because non-trivial properties of computer programs are essentially undecidable (see Rice's theorem). crash or never yield out a result). if one asks whether a certain person belongs to that class. For instance. it is sufficient to perform all operations modulo n (a familiar form of this abstraction is casting out nines). Levels of abstraction A common concept in computer science is levels (or. binary on gates. automatic methods for deriving information on the behavior of computer programs either have to drop termination (on some occasions. Abstractions are useful when dealing with computer programs.wish to know what the result of the evaluation of a mathematical expression involving only integers +. it should be possible to get sound answers from them—even though the abstraction may simply yield a result of undecidability. they may fail. one may simply compare that person's age with the minimal and maximal ages. layers) of abstraction. if his age lies outside the range. making it a language of description that is somewhat self-contained. machine language on binary. programming language on machine language. That is. For example. Each relatively abstract. Abstractions. but one requires that they should be sound. wherein each level represents a different model of the same information and processes. less commonly. gates build on electronic circuits. Model checking is generally performed on abstract versions of the studied systems. "higher" level builds on a relatively concrete. -. Each level is embodied.

This can be used in both system and business process design. See also             Inheritance semantics Algorithm for an abstract description of a computational procedure Abstract data type for an abstract description of a set of data Lambda abstraction for making a term into a function of some variable Higher-order function for abstraction of functions as parameters Bracket abstraction for making a term into a function of a variable Data modeling for structuring data independent of the processes that use it Refinement for the opposite of abstraction in computing Encapsulation for the categorical dual (other side) of abstraction Substitution for the categorical left adjoint (inverse) of abstraction Abstraction inversion for an anti-pattern of one danger in abstraction Greenspun's Tenth Rule for an aphorism about abstracting too much yourself Data modeling . The physical level describes complex low-level data structures in detail. Even though the logical level uses simpler structures. The logical level thus describes an entire database in terms of a small number of relatively simple structures. Some design processes specifically generate designs that contain various levels of abstraction. they need to access only a part of the database. and what relationships exist among those data. Layered architecture The ability to provide a design of different levels of abstraction can   simplify the design considerably. The view level of abstraction exists to simplify their interaction with the system. View level: The highest level of abstraction describes only part of the entire database. Database administrators. Although implementation of the simple structures at the logical level may involve complex physical level structures. Logical level: The next higher level of abstraction describes what data are stored in the database. and enable different role players to effectively work at various levels of abstraction. complexity remains because of the variety of information stored in a large database. The system may provide many views for the same database.Physical level: The lowest level of abstraction describes how the data is actually stored. who must decide what information to keep in a database. Many users of a database system do not need all this information. the user of the logical level does not need to be aware of this complexity. use the logical level of abstraction. instead.

the attributes of that information. Data structure A data model describes the structure of the data within a given domain and. a physical description of the data model instance . and video. email messages.1 Data organization 3 Techniques 4 See also 5 External links Data model A data model instance may be described in two ways:   a logical description of the data model instance . See database model for a list of current data model theories. Managing large quantities of structured and unstructured data is a primary function of information systems. Such a design can be detailed into a logical data model. digital audio.1 Data structure 2 Generic Data Modeling  2. Contents      1 Data model  1. Early phases of many software development projects emphasize the design of a conceptual data model. we are structuring and organizing data. by implication. In addition to defining and organizing the data. data modeling is the process of creating a data model by applying a data model theory to create a data model instance. The model describes the organization of the data to some extent irrespective of how data might be represented in a computer system. Data models describe structured data for storage in data management systems such as relational databases. this model may be translated into physical data model. A data model represents classes of entities (kinds of things) about which a company wishes to hold information. the underlying structure of that domain itself.concentrating on the generic features of the model. These data structures are then typically implemented in a database management system. . and relationships among those entities and (often implicit) relationships among those attributes.In computer science.concentraing on the implementation features of the particular database hosting the model. independent of any particular implementation. They typically do not describe unstructured data. such as word processing documents. A data model theory is a formal data model description. data modeling will impose (implicitly or explicitly) constraints or limitations on the data placed within the structure. In later stages. When data modelling. pictures. This means that a data model in fact specifies a dedicated 'grammar' for a dedicated artificial language for that domain.

"Each ORDER may be composed of one or more LINE ITEMS" and in the other "Each LINE ITEM must be part of one and only one ORDER. PRODUCT INSTANCE. GEOGRAPHIC AREA. If the modelers agree on certain elements which are to be rendered more concretely. properly named relationships form concrete assertions about the subject area. which identify specific roles played by those people. ACTIVITY INSTANCE. since they are not very expressive of real world things. but models that include such concrete entity classes tend to change over time. Proper entity classes are named with natural language words instead of technical jargon. A model which explicitly includes versions of these entity classes will be both reasonably robust and reasonably easy to understand. For example. This can lead to difficulty in bringing the models of different people together. this difference is attributable to different levels of abstraction in the models." Note that this illustrates that often generic terms." A more rigorous approach is to force all relationship names to be prepositions. For example. ACTIVITY TYPE. These include the concepts PARTY (with included PERSON and ORGANIZATION). A proper conceptual data model describes the semantics of a subject area. such as 'is composed of'. however. Likewise. Such an abstract entity class is typically more appropriate than ones called "Vendor" or "Employee". Generic Data Modeling Different modelers may well produce different models of the same domain. This way. and SITE. There are generic patterns that can be used to advantage for modeling business. are defined to be limited in their use for a relationship between specific kinds of things. There are several versions of this. Robust data models often identify abstractions of such entities. both cardinality and optionality can be handled semantically. representing all the people who interact with an organization. gerunds. More abstract models are suitable for general purpose tools. This would mean that the relationship just cited would read in one direction. or participals. a relationship called "is composed of" that is defined to operate on entity classes ORDER and LINE ITEM forms the following concrete assertion definition: Each ORDER "is composed of" one or more LINE ITEMS. Such abstract models are significantly more difficult to manage. CONTRACT. with all actual data being instances of these. It is a collection of assertions about the nature of the information that is used by one or more organizations. and consist of variations on THING and THING TYPE. Invariably. a data model might include an entity class called "Person". PRODUCT TYPE. such as an order and an order line. This constraint is eliminated in the generic data modeling methodologies. with verbs being simply "must be" or "may be". More concrete and specific data models will risk having to change as the environment changes.The entities represented by a data model can be the tangible entities. One approach to generic data modeling has the following characteristics: . then the differences become less significant.

It describes. 6. Additional constraints are defined in the 'reference data'. however. and tablespaces). Every individual thing is an instance of a generic entity called 'individual thing' or one of its subtypes. Entities have a local identifier within a database or exchange file. a composition relationship (indicated by the phrase: 'is composed of') is defined as a relationship between an 'individual thing' and another 'individual thing' (and not just between e. 3. 4. These standard classes are usually called 'reference data'. This way of modeling allows to add standard classes and standard relation types as data (instances). such as 'individual thing'. Ideally. being the highest level where the type of relationship is still valid. are standard instances. Entity types are part of a sub-type/super-type hierarchy of entity types. Activities. and possibly a number of their subtypes.g. tracks. 5. concepts such as car. It may differ. ISO 15926 and Gellish Data organization Another kind of data model describes how to organize data using a database management system or other data management technology. such as 'class of relationship'. length. which makes the data model flexible and prevents data model changes when the scope of the application changes. 'relationship'. But also standard types of relationship. As types of relationships are also entity types they are also arranged in a sub-type/super-type hierarchy of types of relationship. Every individual thing is explicitly classified by a kind of thing ('class') using an explicit classification relationship. and also temperature. this model is derived from the more conceptual data model described above. in order to define a universal context for the model. relational tables and columns or object-oriented classes and attributes. . Such a data model is sometimes referred to as the physical data model. to account for constraints like processing capacity and usage patterns. Entity types are represented. an order and an order line). A generic data model obeys the following rules: 1. for example. 2. 'class'. wheel. Examples of generic data models are ISO 10303-221. ship. Relationships are not used as part of the local identifier. In that architecture. it is called "logical". the physical model describes the storage media (cylinders. These should be artificial and managed to be unique. Types of relationships are defined on a high (generic) level. Candidate attributes are treated as representing relationships to other entity types. For example. This generic level means that the type of relation may in principle be applied between any individual thing and any other individual thing. such as 'is composed of' and 'is involved in' can be defined as standard instances. not the role it plays in a particular context. relationships and event-effects are represented by entity types (not attributes). For example. etc. This means that domain specific knowledge is captured in those standard instances and not as entity types. being standard instances of relationships between kinds of things. building. Entity types are chosen. The classes used for that classification are separately defined as standard instances of the entity 'class' or one of its subtypes. and are named after. but in the original ANSI three schema architecture. the underlying nature of a thing.    A generic data model shall consist of generic entity types.

inseparable. The end-product of the ERM process is an entity-relationship diagram or ERD. Data modeling requires a graphical notation for representing such data models. Techniques Several techniques have been developed for the design of a data models. two different people using the same methodology will often come up with very different results. whole by eliminating unnecessary data redundancies and by relating data structures with relationships. {Presumably we call ourselves systems analysts because no one can say systems synthesists. Methodologies and Modelingby Tony Drewry Entity-relationship model Databases are used to store structured data. one of which is called entityrelationship modeling or ERM. the activity actually has more in common with the ideas and methods of synthesis (inferring general concepts from particular instances) than it does with analysis (identifying component concepts from more general ones). Most notable are:          Entity-relationship model IDEF Object Role Modeling (ORM) or Nijssen's Information Analysis Method (NIAM) Business rules or business rules approach RM/T Bachman diagrams Object-relational_mapping Barker's Notation EBNF Grammars See also  Abstraction (computer science) External links     Article Database Modelling in UML from Methods & Tools Data Modelling Dictionary Data modeling articles Notes on System Development. . An ERD is a type of conceptual data model or semantic data model. can be designed using a variety of techniques.} Data modeling strives to bring the data structures of interest together into a cohesive.While data analysis is a common term for data modeling. While these methodologies guide data modelers in their work. A different approach is through the use of adaptive systems such as artificial neural networks that can autonomously create implicit models of data. together with other constraints. The structure of this data.

The data modeling technique can be used to describe any ontology (i. The classical notation is described in the remainder of this article. this in turn is mapped to a physical model during physical design. In the case of the design of an information system that is based on a database. including information engineering.e. mapped to a logical data model.e. at a later stage (usually called logical design). such as the relational model. There are a range of notations more typically employed in logical and physical database design. Note that sometimes. the conceptual data model is.1 Crow's Feet 4 Classification 5 See also 6 ER diagramming tools 7 References 8 External links Common symbols Two related entities An entity with an attribute A relationship with an attribute Primary key . Contents         1 Common symbols 2 Less common symbols 3 Alternative diagramming conventions  3. There are a number of conventions for entity-relationship diagrams (ERDs). both of these phases are referred to as "physical design". IDEF1x (ICAM DEFinition Language) and dimensional modelling. area of interest).The first stage of information system design uses these models to describe information needs or the type of information that is to be stored in a database during the requirements analysis. and mainly relates to conceptual modelling. an overview and classifications of used terms and their relationships) for a certain universe of discourse (i.

Entities can be thought of as nouns. Attributes are drawn as ovals connected to their owning entity sets by a line. Examples: an owns relationship between a company and a computer. a performs relationship between an artist and a song. Entities are drawn as rectangles. relationships as diamonds. Entities and relationships can both have attributes. a song. . a proved relationship between a mathematician and a theorem. the proved relationship may have a date attribute. A relationship captures how two or more entities are related to one another. Relationships can be thought of as verbs. Examples: an employee entity might have a social security number attribute (in the US). a supervises relationship between an employee and a department.A sample ER diagram An entity represents a discrete object. an employee. a mathematical theorem. Examples: a computer.

If all entities in an entity set must participate in the relationship set. The superclass is connected to the point on top and the two (or more) subclasses are connected to the base. Associative entity is used to solve the problem of two entities with a many-to-many relationship [1]. illustrated with a double-line ellipse. To indicate this. Entity-relationship diagrams don't show single entities or single instances of relations. Less common symbols A weak entity is an entity that can't be uniquely identified by its own attributes alone. A composite attribute may itself contain two or more attributes and is indicated as having at least contributing attributes of its own. if we have an employee database with an employee entity along with an age attribute. addresses usually are composite attributes. Example: a particular song is an entity. Derived attributes are attributes whose value is entirely dependent on another attribute and are indicated by dashed ellipses. Attributes in an ER model may be further described as multi-valued. The collection of all songs in a database is an entity set. composite. a piece of software (entity=application) may have the multivalued attribute "platform" because at least one instance of that entity runs on more than one operating system. For example. or derived. A weak entity set is indicated by a bold rectangle (the entity) connected by a bold arrow to a bold diamond (the relationship). Sometimes two entities are more specific subtypes of a more general type of entity. and so forth. A multi-valued attribute. may have more than one value for at least one instance of its entity. city.Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes. Rather. an arrow is drawn from the entity set to the relationship set. This is called a key constraint. Double lines can be used instead of bold ones. The proved relationship between Andrew Wiles and Fermat's last theorem is a single relationship. a thick arrow is drawn. the age attribute would be derived from a birth date attribute. they show entity sets and relationship sets (displayed as rectangles and diamonds respectively). For example. which is called the entity's primary key. For example. The set of all such mathematician-theorem relationships in a database is a relationship set. For example. This is called a participation constraint. programmers and marketers might both be types of employees at a software company. If each entity of the entity set can participate in at most one relationship in the relationship set. .a unary relationship is a relationship between the rows of a single table. a thick or double line is drawn. composed of attributes such as street address. To indicate that each entity in the entity set is involved in exactly one relationship. Unary Relationships . Lines are drawn between entity sets and the relationship sets they are involved in. and therefore must use as its primary key both its own attributes and the primary key of an entity it is related to. a triangle with "ISA" on the inside is drawn.

the following facts are detailed:    An Artist can perform many Songs. identified by the crow's foot. See also     Data model Data structure diagram Object Role Modeling (ORM) Unified Modeling Language (UML) ER diagramming tools . representing the same relationship depicted in the Common symbols section above. with the following benefits:   Clarity in identifying the many. Concise notation for identifying mandatory relationship. Alternative diagramming conventions Crow's Feet Two related entities shown using Crow's Feet notation The "Crow's Feet" notation is named for the symbol used to denote the many sides of a relationship. in a GERM.A relation and all its participating entity sets can be treated as a single entity set for the purpose of taking part in another relation through aggregation. A Song may or may not be performed by any Artist. and in tools such as Visio and PowerDesigner. This notation is gaining acceptance through common usage in Oracle texts. side of the relationship. shown by the perpendicular line. Thus. which resembles the forward digits of a bird's claw. indicated by drawing a dotted rectangle around all aggregated entities and relationships. or an optional relationship. Classification Entity relationship models can be classified in BERMs (Binary Entity Relation Model) and GERMs (General Entity Relationship Model) according to whether only binary relationships are allowed. An Artist must perform at least one Song. You can see this claw shape in the diagram to the right. A binary relationship is a relationship between two entities. relationships between three or more entities are also allowed. as indicated by the open circle. In the diagram. using a perpendicular bar. using an open circle. using the crow's foot. or child.

ACM Transactions on Database Systems 1 (1): 9-36. and database specifics such as file locations. The citation is listed. indexes. can generate html report. References  Chen. ConceptDraw . SmartDraw .Toward a Unified View of Data". logical and physical models with many of the leading RDBMS brands.supporting conceptual.org/software/ferret/project/what.000 computer science professors.a free software program to draw ER diagrams Ferret (software) a free software ER drawing tool (http://www.csc.Support UML Class Diagram and ERD Dia .a good tool for ERD.a free software flowcharting program that supports ER Diagrams Microsoft Visio .html  Origins of ER model pioneering  more deepened analysis of Chinese language  The Entity-Relationship Model--Toward a Unified View of Data Case study: E-R diagram for Acme Fashion Supplies by Mark H. "The Entity-Relationship Model . DDL SQL statements are more part of the DBMS and have large differences between the SQL variations. Kivio .de/ [2] External links  Peter Chen home page at Louisiana State http://bit. A subset of SQL's instructions form another DDL. XML Schema is an example of a pure DDL (although only relevant in the context of XML). easy-to use ER modeling tool from Embarcadero. Peter P.Getting started by Tony Drewry Introduction to Data Modeling University     Data Definition Language A Data Definition Language (DDL) is a computer language for defining data.cross platform software for creating ER Diagrams DB Visual ARCHITECT . DDL SQL commands include the following: .diagramming software. It was selected as one of the most influential papers in computer science in a recent survey of over 1. These SQL statements define the structure of a database.robust.gnu.modeling suite from Sybase which includes Data Architect for constructing or reverse engineering conceptual.uni-trier. tables. including rows. columns. for example.lsu. logical and physical data modeling including interfaces for multiple target systems. Ridley IDEF1X Notes: Logical Data Structures (LDSs) . (1976). SILVERRUN ModelSphere .           AllFusion ERwin Data Modeler . This paper is one of the most cited papers in the computer field.edu/~chen/chen. some versions can auto-generate an ERD from a database PowerDesigner .html) ER/Studio .point and click drawing method combined with many templates creates professional diagrams. in DBLP: http://dblp.

"update". Truncate . index. table. DBCC (Database Console Commands) . Currently. There has been a standard established for SQL by ANSI. Data manipulation languages were initially only used by computer programs. Drop . which is almost always a verb. .To irreversibly clear a table. and others.Helping programmers and engineers to efficiently utilize the database.Defining and/or implementing access controls to the data Availability .To destroy an existing database. as well. Other forms of DML are those used by IMS/DL1.Verifying or helping to verify data integrity Security . these verbs are "select". In the case of SQL.Creating and testing Backups Integrity . Data manipulation languages have their functional capability organized by the initial word in a statement.Ensuring maximum uptime Performance . the most popular data manipulation language is that of SQL. CODASYL databases (such as IDMS). Data manipulation language is basically of two types: 1) Procedural DMLs 2) Declarative DMLs Database administrator A database administrator (DBA) is a person who is responsible for the environmental aspects of a database. index. "insert". This makes the nature of the language into a set of imperative statements (commands) to the database.• • • • Create . but vendors still "exceed" the standard and provide their own extensions.To modify an existing database object. delete and update data in a database. table. or stored query. Data manipulation languages tend to have many different "flavors" and capabilities between database vendors.Ensuring maximum performance given budgetary constraints Development and testing support .Statements check the physical and logical consistency of a database. which is used to retrieve and manipulate data in a Relational database. these include:       Recoverability . and "delete". or view. but (with the advent of SQL) have come to be used by people. Alter . Data Manipulation Language Data Manipulation Language (DML) is a family of computer languages used by computer programs or database users to retrieve. In general. insert.To make a new database.

each row represents a particular entity (instance)." it should. it is admitted that nothing on the market adheres 100% to those rules. Contents          1 Duties 2 Definition of Database 3 Recoverability 4 Integrity 5 Security 6 Availability 7 Performance 8 Development/Testing Support 9 See also Duties The duties of a database administrator vary and depend on the job description. They nearly always include disaster recovery (backups and testing of backups). Alpha Five. Sybase. Microsoft SQL Server and the like. technically the closest thing to being 'Truly Relational' DBMS's for the desktop PC. and each column represents a type of information about the entity (domain). and many other desktop products marketed at that time far less compliant with Codd's Rules. for any database to be defined as a "Truly Relational Model Database Management System. In a strictly technical sense. Microsoft Access . while many come close.were. the IT industry became dominated by relational DBMSs (Or Object-Relational Database Management System) such as Oracle. Monty's MySQL is still extant and thriving. After experimenting with hierarchical and networked DBMSs during the 1970’s. many others have followed. and while it is unlikely that miniSQL still exist in their original form. Definition of Database A database is a collection of related information. and. corporate and Information Technology (IT) policies and the technical features and capabilities of the DBMS being administered. with Visual FoxPro. accessed and managed by its DBMS.the 1995+ versions. performance analysis and tuning. To date. and some database design. any more than they are 100% ANSISQL compliant.The role of a database administrator has changed according to the technology of database management systems (DBMSs) as well as the needs of the owners of the databases. While IBM and Oracle technically were the earliest on the RDBMS scene. along with the Ingres-descended PostgreSQL. A table is like a spreadsheet. despite various limitations. not the prior versions . adhere to the twelve rules defined by Edgar F. later on. ideally. pioneer in the field of relational databases. . A relational DBMS manages information about types of real-world things (entities) in the form of tables that represent the entities. Codd.

a nonprocedural language that limits the programmer to specifying desired results. the DBA can bring the database backward in time to its state at an instant of logical consistency before the damage was done. in all but the most mature RDBMS packages. which are configurable implicit queries. Recoverability activities include making database backups and storing them in ways that minimize the risk that they will be damaged or lost. If a DBA (or any administrator) attempts to implement a recoverability plan without the recovery tests. Oracle Server and other relational DBMSs enforce this type of business rule with constraints. Recoverability. program bug or hardware failure occurs. there is no guarantee that the backups are at all valid. embody means of preventing users who provide data from breaking the system’s business rules. This message-based interface was a building block for the decentralization of computer hardware. Alternatively. In practice. To . Integrity Integrity means that the database. and so every order must identify one and only one customer. relational DBMSs restrict data access to the messaging protocol SQL. Recoverability Recoverability means that. but the DBA must take the database offline to do such a backup. backups rarely are valid without extensive testing to be sure that no bugs or human error have corrupted the backups." takes two primary forms. Recoverability is the DBA’s most important concern. a retailer may have a business rule that only individual customers can place orders. and so one of the challenges of a multi-user DBMS is provide data about related entities from the standpoint of an instant of logical consistency. For example. such as orders and order lines. an offline database backup can be restored simply by placing the data in-place on another copy of the database. such as placing multiple copies on removable media and storing them outside the affected area of an anticipated disaster. if a data entry error. First the backup. because a program and data structure with such a minimal point of contact become feasible to reside on separate computers. then applying logs against that data to bring the database backup to consistency at a particular point in time up to the last transaction in the logs. or the programs that create its content. also sometimes called "disaster recovery. then recovery tests. The backup of the database consists of data with timestamps combined with database logs to change the data to be consistent to a particular moment in time. The recovery tests of the database consist of restoring the data.Sometimes entities are made up of smaller related entities. Properly managed relational databases minimize the need for application programs to contain information about the physical storage of the data they access. It is possible to make a backup of the database containing only data without timestamps or logs. To maximize the isolation of programs from data structures.

Also. In complex client/server and three-tier systems. Like other metadata. DBAs often focused on the database as a whole. Various tools. Techniques for database performance tuning have changed as DBA's have become more sophisticated in their understanding of what causes performance problems and their ability to diagnose the problem. Availability Availability means that authorized users can access and change data as needed to support the business. was found to be a completely meaningless statistic. Security Security means that users’ ability to access and change data conforms to the policies of the business and the delegation decisions of its managers. a relational DBMS manages security information in the form of tables. 7 days a week. the database buffer cache hit ratio. DBA's understand that performance problems initially must be diagnosed. and looked at database-wide statistics for clues that might help them find out why the system was slow. These tables are the “keys to the kingdom” and so it is important to protect them from intruders.continue the example. the actions DBAs took in their attempts to solve performance problems were often at the global. ). Performance is a major motivation for the DBA to become a generalist and coordinate with specialists in other parts of the system outside of traditional bureaucratic reporting lines. some included with the database and some available from third parties. Most famously. the database is just one of many elements that determine the performance that online users and unattended programs experience. Increasingly. or 24 hours a day. once thought to be the most reliable way to measure database performance. m Performance Performance means that the database does not cause unreasonable online response times. and it does not cause unattended programs to run for an unworkable period of time. the fog has lifted. or changing the amount of memory available to any database program that needed to sort data. such as changing the amount of computer memory available to the database. not the database as a whole. Around the year 2000. . In the 1990s. businesses are coming to expect their data to be available at all times (“24x7”. in the process of inserting a new order the database may query its customer table to make sure that the customer identified by the order exists. database level. many of the most fundamental assumptions about database performance tuning were discovered to be myths. The IT industry has responded to the availability challenge with hardware and network redundancy and increasing online administrative capabilities. and this is best done by examining individual SQL programs. As of 2005.

Support activities include collecting sample production data for testing new and changed programs and loading it into test databases. strong Many SQL (commonly expanded to Structured Query Language — see History for the term's derivation) is the most popular computer language used to create. Having identified the problem. procedural 1974 Donald D. using hints. Chamberlin and Raymond F. Boyce IBM static.provide a behind the scenes look at how the database is handling the SQL program. Here are some IT roles that are related to the role of database administrator:     Application programmer or software engineer System administrator Data administrator Data architect SQL SQL Paradigm: Appeared in: Designed by: Developer: Typing discipline: Major implementations: multi-paradigm: object-oriented. and this is usually done by either rewriting it. adding or modifying indexes. The language has . while results-oriented managers consider it the DBA’s most important duty. update and delete (see also: CRUD) data from relational database management systems. Development/Testing Support Development and testing support is typically what the database administrator regards as his or her least important duty. shedding light on what's taking so long. the individual SQL statement can be tuned. retrieve. or sometimes modifying the database tables themselves. consulting with programmers about performance tuning. functional. and making table design changes to provide new kinds of storage for new program functions.

although drafts of it were circulated internally within IBM in 1969. and now supports object-relational database management systems. Edgar F. a group at IBM's San Jose research center developed a database system "System R" based upon. Codd. SQL has been standardized by both ANSI and ISO.3 Transaction Controls  4.6 Other 5 Criticisms of SQL 6 Alternatives to SQL 7 See also  7. Structured English Query (SQL) Language ("SEQUEL") was designed to manipulate and retrieve data stored in System R. During the 1970s. However.evolved beyond its original purpose.1 Data retrieval  4.5 Data control  4. but not strictly faithful to. Codd's model became widely accepted as the definitive model for relational database management systems (RDBMS or RDMS).1 Reasons for lack of portability 4 SQL keywords  4. was published in June.2 Data manipulation  4. or like the word sequel (IPA: [ˈsiːkwəl]). Contents     1 Pronunciation 2 History  2. each of the major database products (or projects) containing the letters SQL has its own convention: MySQL is officially and commonly pronounced "My Ess Cue El". History An influential paper.1 Standardization 3 Scope  3. by Dr.2 SQL variants 8 References 9 External links      Pronunciation SQL is commonly spoken either as the names of the letters ess-cue-el (IPA: [ˈɛsˈkjuˈɛl]). 1970 in the Association for Computing Machinery (ACM) journal. Communications of the ACM. Codd's model.1 Database systems using SQL  7. The acronym SEQUEL was later condensed to SQL because the word 'SEQUEL' was held as a trademark by the Hawker Siddeley aircraft company of the UK. and Microsoft SQL Server is commonly spoken as Microsoft-sequel-server. PostgreSQL is expediently pronounced postgres (being the name of the predecessor to PostgreSQL). "A Relational Model of Data for Large Shared Data Banks". Although SQL was influenced by .4 Data definition  4. The official pronunciation of SQL according to ANSI is ess-cueel.

Berkeley. triggers. Chamberlin and Raymond F. Donald D.First published by ANSI. As a result. (The last two are somewhat controversial and not yet widely supported. 1992 SQL-92 SQL2 Major revision (ISO 9075). Considerable public interest then developed. CIA and others. relational. was developed in 1974 at U.C. The first non-commercial. Boyce at IBM were the authors of the SEQUEL language design. this testing proved to be a success for IBM. 87 1989 SQL-89 Minor revision. Inc. 1999 SQL:1999 SQL3 Added regular expression matching. nonscalar types and some object-oriented features. In 1978.) 2003 SQL:2003 Introduced XML-related features.[1] At the same time Relational Software. Standardization SQL was adopted as a standard by ANSI (American National Standards Institute) in 1992 and ISO (International Organization for Standardization) in 1987. window functions. SQL/DS (introduced in 1981). (now Oracle Corporation) saw the potential of the concepts described by Chamberlin and Boyce and developed their own version of a RDBMS for the Navy. standardized . The SQL standard has gone through a number of revisions: Year Name Alias Comments 1986 SQL-86 SQL. when in fact they only beat IBM's release of the System/38 by a few weeks. non-SQL database. recursive queries. Ratified by ISO in 1987.Codd's work. and DB2 (in 1983).[1] Their concepts were published to increase interest in SQL. In the summer of 1979 Relational Software. introduced Oracle V2 (Version2) for VAX computers as the first commercially available implementation of SQL. methodical testing commenced at customer test sites. Ingres. Demonstrating both the usefulness and practicality of the system. soon many other vendors developed versions. IBM began to develop commercial products based on their System R prototype that implemented SQL. including the System/38 (announced in 1978 and commercially available in August 1979). and Oracle's future was ensured. Inc. Oracle is often incorrectly cited as beating IBM to market by two years.

such as flow-of-control constructs. and SQL Server 2005 allows any . You can help Wikipedia by improving weasel-worded statements. SQL code can rarely be ported between database systems without major modifications. it provides facilities that permit applications to integrate into their SQL code the use of XQuery. SQL is designed for a specific purpose: to query data contained in a relational database. to concurrently access ordinary SQL-data and XML documents. Tcl.NET language to be hosted within the database server process. including Perl. Another approach is to allow programming language code to be embedded in and interact with the database. in contrast to ANSI C or ANSI Fortran. manipulating it within the database and publishing both XML and conventional SQL-data in XML form. The SQL standard is not freely available. IBM's SQL PL (SQL Procedural Language) and Sybase / Microsoft's Transact-SQL are of a proprietary nature because the procedural programming language they present are non-standardized. SQL is a set-based. 2006 SQL:2006 ISO/IEC 9075-14:2006 defines ways in which SQL can be used in conjunction with XML. Extensions to and variations of the standards exist. such as the DATE or TIME data types. the XML Query Language published by the World Wide Web Consortium (W3C). preferring variations of their own. It defines ways of importing and storing XML data in an SQL database. which can usually be ported from platform to platform without major structural changes. while PostgreSQL allows functions to be written in a wide variety of languages. For example. and C. declarative computer language. Oracle and others include Java in the database. Commercial implementations commonly omit support for basic features of the standard. SQL:2003 and SQL:2006 may be purchased from ISO or ANSI.sequences and columns with auto-generated values (including identitycolumns). Language extensions such as PL/SQL bridge this gap to some extent by adding procedural elements. A late draft of SQL:2003 is available as a zip archive from Whitemarsh Information Systems Corporation. Oracle Corporation's PL/SQL. not an imperative language such as C or BASIC. In addition. The zip archive contains a number of PDF files that define the parts of the SQL:2003 specification. Reasons for lack of portability . Scope The neutrality or factuality of this article or section may be compromised by weasel words.

the standard's specification of the semantics of language constructs is less well-defined. as well as how the tables JOIN to each other.There are several reasons for this lack of portability between database systems:    The complexity and size of the SQL standard means that most databases do not implement the entire standard. HAVING acts much like a WHERE. The SQL standard precisely specifies the syntax that a conforming database system must implement. usually it isn't the verbatim data stored in primitive data types that a user is looking for or a query is written to serve.  HAVING is used to identify which of the "combined rows" (combined rows are produced when the query has a GROUP BY keyword or when the SELECT part contains aggregates).  WHERE is used to identify which rows to be retrieved. Example 1: SELECT * FROM books .  Commonly available keywords related to SELECT include:  FROM is used to indicate from which tables the data is to be taken. the vendor may be unwilling to break backward compatibility. are to be retrieved. However. When restricted to data retrieval commands. but it operates on the results of the GROUP BY and hence can use aggregate functions. WHERE is evaluated before the GROUP BY. but they do not specify what physical operations must be executed to produce that result set. In most applications. SQL allows a wide variety of formulas included in the select list to project data. leading to areas of ambiguity. or applied to GROUP BY. The standard does not specify database behavior in several important areas (e. Data retrieval The most frequently used operation in transactional databases is the data retrieval operation. Often the data needs to be expressed differently from how it's stored.  SELECT is used to retrieve zero or more rows from one or more tables in a database. In specifying a SELECT query. leaving it up to implementations of the database to decide how to behave. Many database vendors have large existing customer bases.  ORDER BY is used to identify which columns are used to sort the resulting data. Translating the query into an efficient query plan is left to the database system. indexes). Data retrieval is very often combined with data projection. the user specifies a description of the desired result set.  SQL keywords SQL keywords fall into several groups.g.  GROUP BY is used to combine rows with related values into elements of a smaller set of rows. more specifically to the query optimizer. SELECT is the most commonly used Data Manipulation Language command. where the SQL standard conflicts with the prior behavior of the vendor's database. SQL acts as a declarative language.

MERGE is used to combine the data of multiple tables. Example output may resemble: Title ---------------------SQL Examples and Guide The Joy of SQL How to use Wikipedia Pitfalls of SQL How SQL Saved my Dog Authors ------3 1 2 1 1 Data manipulation First. 'N'.book_number = book_authors.title Example 2 shows both the use of multiple tables in a join. It retrieves the records from the books table that have a price field which is greater than 100. Alternatively.  INSERT INSERT UPDATE UPDATE DELETE DELETE Example: INTO my_table (field1. and aggregation (grouping). there are the standard Data Manipulation Language (DML) elements. It is something of a combination of the INSERT and UPDATE elements. prior to that. Example: FROM my_table WHERE field2 = 'N'.00 ORDER BY title This is an example that could be used to get a list of expensive books. DELETE removes zero or more existing rows from a table. field2. count(*) AS Authors FROM books JOIN book_authors ON books. Example: my_table SET field1 = 'updated value' WHERE field2 = 'N'. Example 2: SELECT books. specific columns could be named. update and delete data.title. field3) VALUES ('test'. UPDATE is used to modify the values of a set of existing table rows.book_number GROUP BY books.WHERE price > 100. sometimes called an "upsert".    INSERT is used to add zero or more rows (formally tuples) to an existing table. DML is the subset of the language used to add. NULL). The asterisk (*) means to show all columns of the books table. This example shows how many authors there are per book. The result is sorted alphabetically by book title.00. It is defined in the SQL:2003 standard. Transaction Controls . some databases provided similar functionality via different syntax.

COMMIT causes all data changes in a transaction to be made permanent. depending on SQL dialect) can be used to mark the start of a database transaction. ALTER command permits the user to modify an existing object in various ways -. DDL allows the user to define new tables and associated elements. ROLLBACK causes all data changes since the last COMMIT or ROLLBACK to be discarded. COMMIT and ROLLBACK interact with areas such as transaction control and locking. Strictly. PRIMARY KEY (my_field1. DROP causes an existing object within the database to be deleted. Example: BEGIN WORK. so that the state of the data is "rolled back" to the way it was prior to those changes being requested.for example. The most basic items of DDL are the CREATE. DCL handles the authorization aspects of data and permits the user to control who has access to see or manipulate data within the database. my_field3 DATE NOT NULL. Example: CREATE TABLE my_table ( my_field1 INT.RENAME. the semantics of SQL are implementation-dependent. which allow control over nonstandard features of the database system.TRUNCATE and DROP commands.     CREATE causes an object (a table. Data definition The second group of keywords is the Data Definition Language (DDL). In the absence of a BEGIN WORK or similar statement. TRUNCATE deletes all data from a table (non-standard. if available. can be used to wrap around the DML operations. my_field2 VARCHAR (50). Data control The third group of SQL keywords is the Data Control Language (DCL). adding a column to an existing table. both terminate any open transaction and release any locks held on data. but common SQL command). for example) to be created within the database. usually irretrievably.3 WHERE item = 'pants'. which either completes completely or not at all.    BEGIN WORK (or START TRANSACTION.Transaction. COMMIT.ALTER. UPDATE inventory SET quantity = quantity . Most commercial SQL databases have proprietary extensions in their DDL. my_field2) ). Its two main keywords are: .

usually. Other  ANSI-standard SQL supports double dash.offer no standard way for handling treestructures. Theorists and some practitioners note that many of the original SQL features were inspired by. which results in "run-away" result sets when WHERE clauses are mistyped.  GRANT — authorizes one or more users to perform an operation or a set of operations on an object. as a single line comment identifier (some extensions also support curly brackets or C-style /* comments */ for multi-line comments). In particular date and time syntax. SQL .e. the relational model for database management and its tuple calculus realization. Cartesian joins are so rarely used in practice that requiring an explicit CARTESIAN keyword may be warranted. REVOKE — removes or restricts the capability of a user to perform an operation or a set of operations. i. Example: GRANT SELECT. rows recursively referring other rows of the same table. but in violation of. nulls. and comparison case sensitivity often vary from vendor-to-vendor.and the relational model as it is . another_user. as documented in The Third Manifesto. SQL is a declarative computer language for use with "SQL databases". Oracle offers a "CONNECT BY" clause. string concatenation.Retrieve everything from inventory table  Some SQL servers allow User Defined Functions Criticisms of SQL Technically. but have worsened the violations.   Alternatives to SQL . other solutions are database functions which use recursion and return a row set. --. Example: SELECT * FROM inventory -. UPDATE ON my_table TO some_user. there are also some criticisms about the practical use of SQL:  Implementations are inconsistent and. In addition. incompatible between vendors. The language makes it too easy to do a Cartesian join. as possible in Postgresql with PL/PgSQL and other databases. Recent extensions to SQL achieved relational completeness.

A draft language influenced by IBM BS12. Michigan. ^ Donald D.A distinction should be made between alternatives to relational query languages and alternatives to SQL.Object Data Management Group. Idea Group Publishing Hershey. Urrutia A.. 2005. 1974. "Fuzzy Databases: Modeling.. International Conference on Management of Data. Galindo J. 3. which allows objects to be retrieved using a syntax similar to SQL. and formerly within the J2EE/JEE Enterprise Java Bean framework with Entity Beans. Chamberlin and Raymond F. USA. Proceedings of the 1974 ACM SIGFIDET (now SIGMOD) workshop on Data description.       IBM Business System 12 (IBM BS12) Tutorial D TQL . Datalog LINQ NoSQL      See also Database systems using SQL       Comparison of relational database management systems Comparison of truly relational database management systems Comparison of object-relational database management systems List of relational database management systems List of object-relational database management systems List of hierarchical database management systems SQL variants  Comparison of SQL syntax References 1. The list below are proposed alternatives to SQL.Luca Cardelli Top's Query Language . Boyce.A Java-based tool that uses modified SQL EJB-QL (Enterprise Java Bean Query Language/Java Persistence Query Language)[3] An object-based query language.. "SEQUEL: A structured English query language". but are still (nominally) relational. Berkeley Ingres project. Piattini M. It is used within the Java Persistence framework. access and control. Tentatively renamed to SMEQL to avoid confusion with similar projects called TQL. Quel introduced in 1974 by the U. Ann Arbor.C. Design and Implementation". Object Query Language . See navigational database for alternatives to relational. pp. Hibernate Query Language[2] (HQL) . Web page about FSQL: References and links. Discussion on alleged SQL flaws (C2 wiki) 2. External links  SQL Basics . 249–264 1.

Projects. MSSQL. and Politics (early history of SQL) SQL:2003.A free SQL cookbook full of practical examples and queries for all dialects Online Interactive SQL Tutorials How well Oracle.Foreign key . editor of the SQL standard) A Gentle Introduction to SQL at SQLzoo SQL Help and Tutorial The SQL Language (PostgreSQL specific details included) Wikibooks Programming has more about this subject: SQL         SQL Exercises. It was created as a part of the groundbreaking Ingres effort at .Candidate key Database normalization | Referential integrity | Relational DBMS | Distributed DBMS | ACID Objects Topics in SQL Trigger | View | Table | Cursor | Log | Select | Insert | Update | Merge | Delete | Join Transaction | Index | Stored procedure | | Union | Create | Drop Partition Implementations of database management systems QUEL QUEL is a relational database access language. SQL DML Help and Tutorial SQL Tutorial. similar in most ways to SQL. such as Oracle. Informix and others) Topics in database management systems (DBMS) ( view • talk • edit ) Concepts Database | Database model | Relational database | Relational model | Relational algebra | Primary key .A free course on software development using crossplatform C++ and SQL (for any Relational Database. SQL/XML and the Future of SQL (webcast and podcast with Jim Melton. DB2.     The 1995 SQL Reunion: People. SQL Recipes .Surrogate key . PostgreSQL. MSSQL support the SQL Standard SQL Tutorial SQL Tutorial with examples The sbVB DataBase course .Superkey . but somewhat better arranged and easier to use. MySQL. DB2.

state = "FL" print s range of s is student replace s(age=s. and arguably cleaner: select (e. age int. in QUEL a single syntax is used for all commands. create student(name = c10. here is a sample of simple session that creates a table. and then retrieves and modifies the data inside it.name = "Jones" e is a tuple. in this case all the rows in the employee table that have the first name "Jones". state char(2)) insert student (name. age. most notably Informix. address=c0. which can be used to limit queries or return result sets. based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. state = "FL") range of s is student retrieve (s.txt" . and that even similar commands like INSERT and UPDATE use completely different styles. 17. sex char(1).age . age = 17. In many ways QUEL is similar to SQL. defining a set of data. sex = c1. comma=d1. "FL") select * from student where state = "FL" update student set age=age+1 Note that every command uses a unique syntax. sex.age . nl=d1) into "/student. sex = "m".age+1) print s Here is a similar set of SQL statements: create table student(name char(10). Consider this command: copy student(name=c0.all) where s. In SQL the statement is very similar.University of California. One difference is that QUEL statements are always defined by tuple variables. most companies then supporting QUEL moved to SQL instead. Whereas every major SQL command has a format that is at least somewhat different from the others. QUEL was used for a short time in most products based on the freelyavailable Ingres source code.18)) where e. As Oracle and DB2 started taking over the market in the early 1980s. state = c2) append to student(name = "philip". comma=d1. sex=c0. Berkeley. "m". state) values ("philip". taken from the original Ingres paper: range of e is employee retrieve (comp = e. age = i4. inserts a row into it.name = "Jones" QUEL is generally more "normalized" than SQL.salary/ (e.salary/ (e. comma=d1. Consider this example. For instance. Another advantage of QUEL was a built-in system for moving records en-masse into and out of the system. age=c0.18)) as comp from employee as e where e.

Changing the into to a from reverses the process. in parallel to the development of SQL. Doe" and the supplier is located in "Rome". The result of this query depends on what the values are for your the Suppliers and Parts database. as opposed to a data type. Similar commands are available in many SQL systems. . the two languages are largely the same." (print) command are: "U. Zloof at IBM Research during the mid 1970s. but usually as external tools. using visual tables where the user would enter commands. Query by Example Query by Example (QBE) is a database query language for relational databases. Contents       1 Example 2 See also 3 References 4 QBE 5 Sources 6 External links Example A simple example using the Suppliers and Parts database is given here. however. "I. Many graphical front-ends for databases use the ideas from QBE today. With these differences. It was devised by Moshé M. The d1 indicates a delimiter. This makes them unavailable to stored procedures. just to give you a feel for how QBE works. QBE is based on the notion of Domain relational calculus." (delete).which creates a comma-delimited file of all the records in the student table." (insert) and "D." (update). Other commands like the "P. This "query" selects all supplier numbers (S#) where the owner of the supplier company is "J. example elements and conditions. as opposed to being internal to the SQL language. It is the first graphical query language.

WikiProject Computer science or the Computer science Portal may be able to help recruit one. rigorous definitions of the normal forms are concerned with the . 1975. Database normalization This article or section is in need of attention from an expert on the subject.e. using key words from the input the user provided.. Katherine Harutunian: An Introduction to Database Systems. (i. etc… to make the search more efficient and not to barrage the user with results. This is accomplished by the user’s submission of documents (or numerous documents) to the QBE results template. It auto-eliminates mundane words such as and. quickly searching through documents to match your entered criteria. Pearson Education Inc. If a more appropriate WikiProject or portal exists. the user simply enters (or copy and paste) the target text into the search form field. AFIPS.J. for example. a table in third normal form is less open to logical inconsistencies and anomalies than a table that is only in second normal form. Database Management Systems 3rd edition. It’s commonly believed that QBE is far easier to learn than other. Although the normal forms are often defined (informally) in terms of the characteristics of tables. is. Chapter 6. "8 Relational Calculus". Date. 44. Raghu Ramakrishnan. Query by Example. more formal query languages. Zloof.See also  Microsoft Query by Example References    M. please adjust this template accordingly. QBE Query by Example (QBE) is a powerful search tool that allows anyone to search a system for document(s) by entering an element such as a text string or document name. the. in Maite Suarez-Rivas. when compared with a formal query. that will obviously then search for relevant and similar material for the specified list. The query is created and then the search begins. However. ISBN 0-321-18956-6. Johannes Gehrke. Tables can be normalized to varying degrees: relational database theory defines "normal forms" of successively higher degrees of stringency. the results in the QBE system will be more variable. C. (2004). Database normalization is a design technique by which relational database tables are structured in such a way as to make them invulnerable to certain types of logical inconsistencies and anomalies. SQL) while still providing people with the opportunity to perform powerful searches. so. When the user clicks search (or hits enter) the input is passed to the QBE parser for processing. Searching for documents based on matching text is easy with QBE. The user can also search for similar documents based on the text of a full document that he or she may have. The analysis of these document(s) the user has inputted via the QBE parser will generate the required query and submits it to the search engine. or.

Course ID. and from anomalies involving data operations.characteristics of mathematical constructs known as relations.5 4NF  5. therefore updates to the table may result in logical inconsistencies. suppose a table has the attributes Student ID.6 Fifth normal form  4. This phenomenon is known as a deletion anomaly. that the table no longer tells us which lecturer has been assigned to teach the course. each record in an unnormalized "DVD Rentals" table might contain a DVD ID. In such a table:    The same fact can be expressed on multiple records. If the number of students enrolled in the course temporarily drops to zero. as a side-effect.8 Sixth normal form 5 Example Of The Process  5. that is. the table provides conflicting answers to the question of what this particular member's address is. For example.6 5NF 6 Denormalization  6. Whenever information is represented relationally—that is. In the above example. as values within rows beneath fixed column headings—it makes sense to ask to what extent the representation is normalized.3 Third normal form  4. This phenomenon is known as an insertion anomaly.1 Non-first normal form (NF²) 7 Further reading 8 References 9 See also 10 External links       Problems addressed by normalization A table that is not sufficiently normalized can suffer from logical inconsistencies of various types. and Member Address. and Lecturer ID (a given student is enrolled in a given course. the last of the records referencing that course must be deleted—meaning. then we cannot record the address of a member who has not yet rented any DVDs. roughly speaking. Contents     1 Problems addressed by normalization 2 Background to normalization: definitions 3 History 4 Normal forms  4.3 2NF  5.2 Second normal form  4. If the update is not carried through successfully—if. Specifically. This phenomenon is known as an update anomaly. which is taught by a given lecturer). .7 Domain/key normal form  4.5 Fourth normal form  4.2 1NF  5. thus a change of address for a particular member will potentially need to be applied to multiple records.4 3NF and BCNF  5. if it is the case that Member Address is held only in the "DVD Rentals" table. There are circumstances in which the deletion of data representing certain facts necessitates the deletion of data representing completely different facts. For example.1 First normal form  4. the member's address is updated on some records but not others—then the table is left in an inconsistent state. There are circumstances in which certain facts cannot be recorded at all.1 Starting Point  5.4 Boyce-Codd normal form  4. Member ID.

a relational database should be designed in such a way as to exclude the possibility of update. understanding this. because a particular Member Address value corresponds to every Member ID value. Background to normalization: definitions         Functional dependency: Attribute B has a functional dependency on attribute A if. but not a full functional dependency. requires knowledge of the problem domain. An attribute may be functionally dependent either on a single attribute or on a combination of attributes. in other words. Edgar F. For example. Member ID. would convey exactly the same information as the original table. but later researchers would refer to such a structure as an abstract data type. Normalization typically involves decomposing an unnormalized table into two or more tables which. The higher the normal form applicable to a table. Trivial functional dependency: A trivial functional dependency is a functional dependency of an attribute on a superset of itself. rather than a number of possible unique keys. Codd. {Member Address} has a functional dependency on {DVD ID. a very simple elimination[1] procedure which we shall call normalization. Member Address} would be a superkey for the "DVD Rentals" table. two distinct rows are always guaranteed to have distinct superkeys. as is {Member Address} → {Member Address}. and b) not functionally dependent on any proper subset of X. a table fails to meet the requirements of any normal form higher than its HNF. Codd used the term "non-simple" domains to describe a heterogeneous data structure. The normal forms are applicable to individual tables. Superkey: A superkey is an attribute or set of attributes that uniquely identifies rows within a table. in fact. insertion. Multivalued dependency: A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows: see the Multivalued Dependency article for a rigorous definition. Member Address has a functional dependency on Member ID. also by definition. for each value of attribute A." „ —Edgar F. the less vulnerable it is to such inconsistencies and anomalies. Candidate key: A candidate key is a minimal superkey. in turn. A primary key is a candidate key which the database designer has designated for this purpose. for it is also dependent on {Member ID}. Member ID} would also be a superkey. Member Address} → {Member Address} is trivial. Through decomposition non-simple domains are replaced by "domains whose elements are atomic (non-decomposable) values. there is exactly one value of attribute B. . It is possible to correct an unnormalized design so as to make it adhere to the demands of the normal forms: this is normalization. {DVD ID. The normal forms of relational database theory provide guidelines for deciding whether a particular design will be vulnerable to such anomalies. Codd first proposed the process of normalization and what came to be known as the 1st normal form: “ There is. Member ID} would be a candidate key for the "DVD Rentals" table. Member Address would be a non-prime attribute in the "DVD Rentals" table. that is. a superkey for which we can say that no proper subset of it is also a superkey. Primary key: Most DBMSs require a table to be defined as having a single unique key. were they to be combined (joined). Non-prime attribute: A non-prime attribute is an attribute that does not occur in any candidate key. {Member ID. a table always meets the requirements of its HNF and of all normal forms lower than its HNF. It is not possible to determine the extent to which a design is normalized without understanding what functional dependencies apply to the attributes within its tables. and deletion anomalies. A Relational Model of Data for Large Shared Data Banks[2] In his paper. to say that an entire database is in normal form n is to say that all of its tables are in normal form n. Each table has a "highest normal form" (HNF): by definition. {DVD ID. History Edgar F. Full functional dependency: An attribute is fully functionally dependent on a set of attributes X if it is a) functionally dependent on X. Member ID}. Normal forms The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of vulnerability to logical inconsistencies and anomalies. {DVD ID.Ideally.

and Manager Hire Date. This is not an accurate description of how normalization typically works. For example. therefore it must have at least one candidate key. Codd originally defined the first three normal forms (1NF.}. For example. Edgar F. no attributes which occur a different number of times on different records. a 1NF design is first normalized to 2NF. in other words. then to 3NF. Sixth normal form (6NF) incorporates considerations relevant to temporal databases.  Note that all relations are in 1NF. and Employee Date of Birth. None of the non-prime attributes of the table are functionally dependent on a part (proper subset) of a candidate key. 2NF. Achieving the "higher" normal forms (above 3NF) does not usually require an extra expenditure of effort on the part of the designer. This representation would not be in 1NF. A violation of 3NF would mean that at least one non-prime attribute is only indirectly dependent (transitively dependent) on a candidate key. A sensibly designed table is likely to be in 3NF on the first attempt. furthermore. and suppose that each manager can manage one or more departments. Department Name..Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion. These normal forms have been summarized as requiring that all non-key attributes be dependent on "the key. This means the table is not in 3NF. Boyce-Codd normal form The criteria for Boyce-Codd normal form (BCNF) are:  The table must be in 3NF. by virtue of being functionally dependent on another non-prime attribute. There are no non-trivial functional dependencies between non-prime attributes. consider a "Department Members" table whose attributes are Department ID. The fourth and fifth normal forms (4NF and 5NF) deal specifically with the representation of many-to-many and one-tomany relationships among attributes. Although Manager Hire Date is functionally dependent on {Department ID}.e. and so on. suppose that an employee can have multiple skills: a possible representation of employees' skills is {Employee ID. First normal form The criteria for first normal form (1NF) are:  A table must be guaranteed not to have any duplicate records.e. it is also functionally dependent on the non-prime attribute Manager ID. Second normal form The criteria for second normal form (2NF) are:   The table must be in 1NF. and suppose that an employee works in one or more departments. . Employee ID. because 3NF tables usually need no modification to meet the requirements of these higher normal forms. Manager ID. Skill1. Skill2. consider a "Departments" table whose attributes are Department ID. and 3NF). it is overwhelmingly likely to have an HNF of 5NF.e.  There must be no duplicate groups.  Note that if none of a 1NF table's candidate keys are composite – i. the whole key and nothing but the key". For example. Third normal form The criteria for third normal form (3NF) are:   The table must be in 2NF.. where {Employee ID} is the unique identifier for a record. The question of whether a given representation is in 1NF is equivalent to the question of whether it is a relation. The combination of Department ID and Employee ID uniquely identifies records within the table. i. i. Given that Employee Date of Birth depends on only one of those attributes – namely. {Department ID} is a candidate key. if it is 3NF. Employee ID – the table is not in 2NF. Skill3 . all functional dependencies of non-prime attributes on candidate keys are full functional dependencies. every candidate key consists of just one attribute – then we can say immediately that the table is in 2NF.

repeating groups of part. or see TSQL2 for a different approach. Suppliers and Parts By Company Division Company Company Company Division Part Type Supplier Supplier Supplier . Fifth normal form The criteria for fifth normal form (5NF and also PJ/NF) are:   The table must be in 4NF. only recently proposed: the sixth normal form (6NF) was only defined when extending the relational model to take into account the temporal dimension. (Discuss) This normal form was. There must be no non-trivial multivalued dependencies on something other than a superkey. Darwen and Lorentzos[3] for a relational temporal extension.and supplier-related information occur. it captures information about the types of parts which each division of each company sources from its suppliers. most current SQL technologies as of 2005 do not take into account this work. Domain/key normal form Domain/key normal form (or DKNF) requires that a table not be subject to any constraints other than domain constraints and key constraints. There must be no non-trivial join dependencies that do not follow from the key constraints. few designs lend themselves to being normalized in strict stages in which the HNF increases at each stage. Fourth normal form Every non-trivial functional dependency must be a dependency on a superkey. A 4NF table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys. Every record is for a particular Company/Division combination: for each of these combinations. Example Of The Process The following example illustrates how a database designer might employ his knowledge of the normal forms to make progressive improvements to an initially unnormalized database design. and most temporal extensions to SQL are not relational. as of 2005. The criteria for fourth normal form (4NF) are:   The table must be in BCNF. See work by Date. The example is somewhat contrived: in practice. Starting Point Information has been presented initially in a way that does not even meet 1NF. Unfortunately. 1NF does not permit repeating groups. Sixth normal form It has been suggested that this section be split into a new article entitled Sixth normal form. The database in the example captures information about the suppliers with which various companies' divisions have relationships – more specifically. A BCNF table is said to be in 4NF if and only if all of its multivalued dependencies are functional dependencies.

Amer. Amer. Global Robot Nils Neumann Gearbox Industrial Robots Flywheel Axle Axle Mechanical Arm Wheels 4 Wheels 4 TransEuropa TransEuropa Less USA Less USA Italy Italy N. Allied Clock and Horace Watch Washington Sundial Clocks Spring Pieza de Acero Mexico N. Amer. Amer. Amer. Allied Clock and Horace Sundial Clocks Toothed Pieza de Acero Mexico N. Amer. Part Type. N. The unique identifier for a record is now {Company. . Supplier}. Amer. Amer. Europe Europe Global Robot Nils Neumann Gearbox Domestic Robots Artificial Brain Prometheus Labs Luxembourg Artificial Brain Frankenstein Labs Germany Metal Housing Pieza de Acero Mexico Backplate Pieza de Acero Mexico Europe Europe N. Amer. Amer. Suppliers and Parts By Company Division Company Company Founder Company Division Logo Part Type Supplier Supplier Country Supplier Continent Allied Clock and Horace Watch Washington Sundial Clocks Spring Tensile Globodynamics USA N. N. N. Allied Clock and Horace Watch Washington Sundial Watches Quartz Crystal Microflux Tuning Fork Microflux Battery Dakota Electrics Belgium Belgium USA Europe Europe N. 1NF We eliminate the repeating groups by ensuring that each group appears on its own record. Amer. N. Amer. Amer. Allied Clock and Horace Watch Washington Sundial Clocks Pendulum Tensile Globodynamics USA N.Founder Logo Country Continent Allied Clock and Horace Watch Washington Sundial Clocks Spring Pendulum Spring Toothed Wheel Tensile Globodynamics USA Tensile USA Globodynamics Mexico Pieza de Acero Mexico Pieza de Acero N. Division. N.

Amer. Global Robot Nils Neumann Gearbox Domestic Robots Backplate Pieza de Acero Mexico N. Amer. 2NF One problem with the design at this stage is that Company Founder and Company Logo details for a given company may appear redundantly on more than one record. Amer. Amer. Global Robot Nils Neumann Gearbox Industrial Robots Axle Wheels 4 Less USA N.Watch Washington Wheel Allied Clock and Horace Watch Washington Sundial Watches Quartz Crystal Microflux Belgium Europe Allied Clock and Horace Watch Washington Sundial Watches Tuning Fork Microflux Belgium Europe Allied Clock and Horace Watch Washington Sundial Watches Battery Dakota Electrics USA N. so may Supplier Countries and Continents. Global Robot Nils Neumann Gearbox Industrial Robots Flywheel Wheels 4 Less USA N. Amer. Global Robot Nils Neumann Gearbox Industrial Robots Axle TransEuropa Italy Europe Global Robot Nils Neumann Gearbox Industrial Robots Mechanical Arm TransEuropa Italy Europe Global Robot Nils Neumann Gearbox Domestic Robots Artificial Brain Prometheus Labs Luxembourg Europe Global Robot Nils Neumann Gearbox Domestic Robots Artificial Brain Frankenstein Labs Germany Europe Global Robot Nils Neumann Gearbox Domestic Robots Metal Housing Pieza de Acero Mexico N. These phenomena arise from .

the part-key dependencies of a) the Company Founder and Company Logo attributes on Company, and b) the Supplier Country and Supplier Continent attributes on Supplier. 2NF does not permit part-key dependencies. We correct the problem by splitting out the Company Founder and Company Logo details into their own table, called Companies, as well as splitting out the Supplier Country and Supplier Continent Details into their own table, called Suppliers.

Suppliers and Parts By Company Division

Company

Division

Part Type

Supplier

Allied Clock and Watch Clocks

Spring

Tensile Globodynamics

Allied Clock and Watch Clocks

Pendulum

Tensile Globodynamics

Allied Clock and Watch Clocks

Spring

Pieza de Acero

Allied Clock and Watch Clocks

Toothed Wheel Pieza de Acero

Allied Clock and Watch Watches

Quartz Crystal

Microflux

Allied Clock and Watch Watches

Tuning Fork

Microflux

Allied Clock and Watch Watches

Battery

Dakota Electrics

Global Robot

Industrial Robots Flywheel

Wheels 4 Less

Global Robot

Industrial Robots Axle

Wheels 4 Less

Global Robot

Industrial Robots Axle

TransEuropa

Global Robot

Industrial Robots Mechanical Arm TransEuropa

Global Robot

Domestic Robots Artificial Brain

Prometheus Labs

Global Robot

Domestic Robots Artificial Brain

Frankenstein Labs

Global Robot

Domestic Robots Metal Housing

Pieza de Acero

Global Robot

Domestic Robots Backplate

Pieza de Acero

Companies

Company

Company Founder Company Logo

Allied Clock and Watch Horace Washington Sundial

Global Robot

Nils Neumann

Gearbox

Suppliers

Supplier

Supplier Country Supplier Continent

Tensile Globodynamics USA

N. Amer.

Pieza de Acero

Mexico

N. Amer.

Microflux

Belgium

Europe

Dakota Electrics

USA

N. Amer.

Wheels 4 Less

USA

N. Amer.

TransEuropa

Italy

Europe

Prometheus Labs

Luxembourg

Europe

Frankenstein Labs

Germany

Europe

3NF and BCNF There is still, however, redundancy in the design. The Supplier Continent for a given Supplier Country may appear redundantly on more than one record. This phenomenon arises from the dependency of non-key attribute Supplier Continent on non-key attribute Supplier Country, and means that the design does not conform to 3NF. To achieve 3NF (and, while we are at it, BCNF), we create a separate Countries table which tells us which continent a country belongs to.

Suppliers and Parts By Company Division

Company

Division

Part Type

Supplier

Allied Clock and Watch Clocks

Spring

Tensile Globodynamics

Allied Clock and Watch Clocks

Pendulum

Tensile Globodynamics

Allied Clock and Watch Clocks

Spring

Pieza de Acero

Allied Clock and Watch Clocks

Toothed Wheel Pieza de Acero

Allied Clock and Watch Watches

Quartz Crystal

Microflux

Allied Clock and Watch Watches

Tuning Fork

Microflux

Allied Clock and Watch Watches

Battery

Dakota Electrics

Global Robot

Industrial Robots Flywheel

Wheels 4 Less

Global Robot

Industrial Robots Axle

Wheels 4 Less

Global Robot

Industrial Robots Axle

TransEuropa

Global Robot

Industrial Robots Mechanical Arm TransEuropa

Global Robot Domestic Robots Artificial Brain Prometheus Labs Global Robot Domestic Robots Artificial Brain Frankenstein Labs Global Robot Domestic Robots Metal Housing Pieza de Acero Global Robot Domestic Robots Backplate Pieza de Acero Suppliers Supplier Supplier Country Tensile Globodynamics USA Pieza de Acero Mexico Microflux Belgium Dakota Electrics USA Wheels 4 Less USA TransEuropa Italy Prometheus Labs Luxembourg Frankenstein Labs Germany Companies Company Company Founder Company Logo .

Allied Clock and Watch Horace Washington Sundial Global Robot Nils Neumann Gearbox Countries Country Continent USA N. Amer. Belgium Europe Italy Europe Luxembourg Europe 4NF What happens if a company has more than one founder or more than one logo? (Let us assume for the sake of the example that both of these things may happen. Amer. Representing multiple founders and multiple logos then becomes possible. Mexico N. Company Logo}.) One way of handling the situation would be to alter the primary key of our Companies table to {Company. but at the price of redundancy: Companies Company Company Founder Company Logo Allied Clock and Watch Horace Washington Sundial Global Robot International Broom International Broom International Broom International Broom Nils Neumann Gareth Patterson Sandra Patterson Gareth Patterson Sandra Patterson Gearbox Whirlwind Whirlwind Sweeper Sweeper This type of redundancy reflects the fact that the design does not conform to 4NF. Suppliers and Parts By Company Division Company Division Part Type Spring Pendulum Spring Supplier Tensile Globodynamics Tensile Globodynamics Pieza de Acero Allied Clock and Watch Clocks Allied Clock and Watch Clocks Allied Clock and Watch Clocks . Company Founder. We correct the design by separating facts about founders from facts about logos.

Allied Clock and Watch Clocks Allied Clock and Watch Watches Allied Clock and Watch Watches Allied Clock and Watch Watches Global Robot Global Robot Global Robot Global Robot Global Robot Global Robot Global Robot Global Robot Companies Toothed Wheel Pieza de Acero Quartz Crystal Tuning Fork Battery Microflux Microflux Dakota Electrics Wheels 4 Less Wheels 4 Less TransEuropa Industrial Robots Flywheel Industrial Robots Axle Industrial Robots Axle Industrial Robots Mechanical Arm TransEuropa Domestic Robots Artificial Brain Domestic Robots Artificial Brain Domestic Robots Metal Housing Domestic Robots Backplate Prometheus Labs Frankenstein Labs Pieza de Acero Pieza de Acero Company Allied Clock and Watch Global Robot International Broom Company Logos Company Company Logo Allied Clock and Watch Sundial Global Robot Gearbox International Broom Whirlwind International Broom Sweeper .

Company Founders Company Company Founder Allied Clock and Watch Horace Washington Global Robot Nils Neumann International Broom Gareth Patterson International Broom Sandra Patterson Suppliers Supplier Supplier Country Tensile Globodynamics USA Pieza de Acero Mexico Microflux Belgium Dakota Electrics USA Wheels 4 Less USA TransEuropa Italy Prometheus Labs Luxembourg Frankenstein Labs Germany .

This rule leads to redundancy in our design as it stands. Let us suppose for the sake of the example that the following rule applies: if a supplier that a division deals with offers a part that the division needs.Countries Country Continent USA N. We also know that the Clocks division deals with suppliers Tensile Globodynamics and Pieza de Acero. and adding a further table that provides information as to which suppliers offer which parts. causing it to fall short of 5NF. for example. and toothed wheels. Amer. the division will always purchase it. pendulums. We correct the design by recording part-types-by-company-division separately from suppliers-by-company-division. Part Types By Company Division Company Division Part Type Allied Clock and Watch Clocks Spring Allied Clock and Watch Clocks Pendulum Allied Clock and Watch Clocks Toothed Wheel Allied Clock and Watch Watches Quartz Crystal . Tensile Globodynamics start producing Toothed Wheels. Mexico N. then Allied Clock and Watch will start purchasing them. Amer. Belgium Europe Italy Europe Luxembourg Europe 5NF We know that the Clocks division of Allied Clock and Watch relies upon its suppliers to provide springs. If.

Allied Clock and Watch Watches Tuning Fork Allied Clock and Watch Watches Battery Global Robot Industrial Robots Flywheel Global Robot Industrial Robots Axle Global Robot Industrial Robots Mechanical Arm Global Robot Domestic Robots Artificial Brain Global Robot Domestic Robots Metal Housing Global Robot Domestic Robots Backplate Suppliers By Company Division Company Division Supplier Allied Clock and Watch Clocks Tensile Globodynamics Allied Clock and Watch Clocks Pieza de Acero Allied Clock and Watch Watches Microflux Allied Clock and Watch Watches Dakota Electrics Global Robot Industrial Robots Wheels 4 Less Global Robot Industrial Robots TransEuropa Global Robot Domestic Robots Prometheus Labs .

Global Robot Domestic Robots Frankenstein Labs Global Robot Domestic Robots Pieza de Acero Parts By Supplier Part Type Supplier Spring Tensile Globodynamics Pendulum Tensile Globodynamics Spring Pieza de Acero Toothed Wheel Pieza de Acero Quartz Crystal Microflux Tuning Fork Microflux Battery Dakota Electrics Flywheel Wheels 4 Less Axle Wheels 4 Less Axle TransEuropa Mechanical Arm TransEuropa Artificial Brain Prometheus Labs .

Artificial Brain Frankenstein Labs Metal Housing Pieza de Acero Backplate Pieza de Acero Companies Company Company Logo Allied Clock and Watch Sundial Global Robot Gearbox Company Founders Company Company Founder Allied Clock and Watch Horace Washington Global Robot Nils Neumann International Broom Gareth Patterson International Broom Sandra Patterson Suppliers Supplier Supplier Country .

Belgium Europe Italy Europe Luxembourg Europe . Amer.Tensile Globodynamics USA Pieza de Acero Mexico Microflux Belgium Dakota Electrics USA Wheels 4 Less USA TransEuropa Italy Prometheus Labs Luxembourg Frankenstein Labs Germany Countries Country Continent USA N. Mexico N. Amer.

although "unnest" is the mathematical inverse to "nest". Addison-Wesley Longman. The normalized alternative to the star schema is the snowflake schema. (2002). Database Debunkings H. N.. & Lorentzos.J.g.. databases intended for OLAP operations are primarily "read only" databases. C. Non-first normal form (NF²) In recognition that denormalization can be deliberate and useful. J. By contrast. (1999). Another constraint required is for the operators to be bijective. the non-first normal form is a definition of database designs which do not conform to the first normal form. ISBN 1-55860-855-9. pp.). An Introduction to Database Systems (8th ed. Schek. Temporal Data & the Relational Model (1st ed. price lookups). favorite colors consist of a set of colors modeled by the given table. To transform this NF² table into a 1NF an "unnest" operator is required which extends the relational algebra of the higher normal forms. Since these use the data for look-up only (e. redundant or "denormalized" data may facilitate Business Intelligence applications. Consider the following table: Non-First Normal Form Person Favorite Colors Bob blue. Morgan Kaufmann. ISBN 0-32119784-4. C. The denormalized or redundant data must be carefully controlled during ETL processing.-J. H. OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter. Further reading       Litt's Tips: Normalization Date. & Darwen. This extension introduces hierarchies in relations. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory. The expectation is that each transaction will leave the database in a consistent state. The reverse operator is called "nest" which is not always the mathematical inverse of "unnest". Denormalization is also used to improve performance on smaller computers as in computerized cash-registers. C. 120-125 Date. J. Obviously. no changes are to be made to the data and a swift response is crucial. H. W. OLAP applications tend to extract historical data that has accumulated over a long period of time. F. P. Communications of the ACM. vol. & Darwen. dimensional tables in a star schema often contain denormalized data. which is covered by the Partitioned Normal Form (PNF). For such databases. Date. red Jane green. yellow. & Pascal.. by allowing "sets and sets of sets to be attribute domains" (Schek 1982). Kent. red Assume a person has several favorite colors.. 26.Pistor Data Structures for an Integrated Data Base Management and Information Retrieval System References . and users should not be permitted to see the data until it is in a consistent state.Denormalization Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for On Line Analytical Processing (OLAP). Specifically.).

(June 1970). it's the process of efficiently organizing data in a database. knowing the principles of normalization and applying them to your daily database design tasks really isn't all that complicated and it could drastically improve the performance of your DBMS. So. 2NF. ^ DBDebunk.1. Rules of Data Normalization by Data Model. storing the same data in more than one table) and ensure data dependencies make sense (only storing related data in a table). chances are you've heard the term normalization. In practical applications. Communications of the ACM 13 (6): 377-387. 3.org A tutorial on the first 3 normal forms by Fred Coulson Free PDF poster available by Marc Rettig Description of the database normalization basics by Microsoft Database Normalization Basics If you've been working with databases for a while. Future articles will provide in-depth explorations of the normalization process. Part 2 An Introduction to Database Normalization by Mike Hillyer. The database community has developed a series of guidelines for ensuring that databases are normalized. 2. University of Texas. yeah. and 3NF along with the occasional 4NF. you'll often see 1NF. it's important to point out that they are . Edgar F. See also ^ His term eliminate is misleading. we'll introduce the concept of normalization and take a brief look at the most common normal forms. as nothing is "lost" in normalization. Normalization by ITS. "A Relational Model of Data for Large Shared Data Banks". Before we begin our discussion of the normal forms.com) Database Normalization Intro. ^ Codd. the reply is "Uh.       Aspect (computer science) Cross-cutting concern Inheritance semantics Functional normalization Orthogonalization Refactoring External links         Database Normalization Basics by Mike Chapple (About." Normalization is often brushed aside as a luxury that only academics have time for. Fifth normal form is very rarely seen and won't be discussed in this article. There are two goals of the normalization process: eliminate redundant data (for example. what is normalization? Basically. Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Perhaps someone's asked you "Is that database normalized?" or "Is that in BCNF?" All too often. referred to as first normal form or 1NF) through five (fifth normal form or 5NF). He probably described eliminate in a mathematical sense to mean elimination of complexity. However. In this article. These are referred to as normal forms and are numbered from one (the lowest form of normalization.

when variations take place. Occasionally. That said. these normalization guidelines are cumulative. fourth normal form (4NF) has one additional requirement: • • Meet all the requirements of the third normal form. For a database to be in 2NF. However. it becomes necessary to stray from them to meet practical business requirements. Finally. it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. it must first fulfill all the criteria of a 1NF database. Second normal form (2NF) further addresses the concept of removing duplicative data: • • • Meet all the requirements of the first normal form. A relation is in 4NF if it has no multi-valued dependencies. First normal form (1NF) sets the very basic rules for an organized database: • • Eliminate duplicative columns from the same table. Remember. Third normal form (3NF) goes one large step further: • • Meet all the requirements of the second normal form.guidelines and guidelines only. Network model Database models Common models      Hierarchical Network Relational Object-relational Object Other models    Associative Concept-oriented Multi-dimensional . Create relationships between these new tables and their predecessors through the use of foreign keys. let's explore the normal forms. Remove columns that are not dependent upon the primary key. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key). Remove subsets of data that apply to multiple rows of a table and place them in separate tables.

Until the early 1980s the performance benefits of the low-level navigational interfaces offered by hierarchical and network databases were persuasive for many large-scale applications. the extra productivity and flexibility of the relational model replaced the network model in corporate enterprise usage. the network model allows each record to have multiple parent and child records. but this had little influence on products.  Star schema XML database The network model is a database model conceived as a flexible way of representing objects and their relationships. Although the model was widely implemented and used. However. was that it allowed a more natural modeling of relationships between entities. and as such it is not well suited to distributed. The navigational interface offered by the network model bears some resemblance to the hyperlink-based models that have become popular with the advent of the Internet and World Wide Web. Hierarchical model Database models Common models . with each record having one parent record and many children. IBM chose to stick to the hierarchical model with semi-network extensions in their established products such as IMS and DL/I. but as hardware became faster. more declarative interface. culminating in an ISO specification. Subsequent work continued into the early 1980s. The chief argument in favour of the network model. which became the basis for most implementations. Its original inventor was Charles Bachman. the network model (like the relational model) assumes that the entire database has a centrally-managed schema. Where the hierarchical model structures data as a tree of records. Contents     1 History 2 See also 3 References 4 External links History In 1969. which offered a higher-level. the Conference on Data Systems Languages (CODASYL) established the first specification of the network database model. it was eventually displaced by the relational model. it failed to become dominant for two main reasons. This was followed by a second publication in 1971. Firstly. forming a lattice structure. in comparison to the hierarchic model. Secondly. and it was developed into a standard specification published in 1969 by the CODASYL Consortium. heterogeneous environments.

each individual record is represented as a row and an attribute as a column. but very difficult to answer others. Last Name. In a database. Contents     1 Tree Data structure in Relational Model 2 Some Well-known Hierarchical Databases 3 References 4 External links .g. but each child may only have one parent. These two segments form a hierarchy where an employee may have many children. Other than that. The Employee table represents a parent segment and the Children table represents a Child segment. The company also has data about the employee’s children in a separate table called "Children" with attributes such as First Name. an entity type is the equivalent of a table. the hierarchical model is rare in modern databases. Hierarchical structures were widely used in the first mainframe database management systems. Last Name. however.     Hierarchical Network Relational Object-relational Object Other models      Associative Concept-oriented Multi-dimensional Star schema XML database In a hierarchical data model.[1] The most common form of hierchical model used currently is the LDAP model. In the table there would be attributes/columns such as First Name. If a one-to-many relationship is violated (e. An example of a hierarchical data model would be if an organization had records of employees in a table (entity type) called "Employees". ranging from file systems to the Windows registry to XML documents. then the hierarchy becomes a network. a patient can have more than one physician). Hierarchical relationships between different types of data can make it very easy to answer some questions. Job Name and Wage. data are organized into a tree-like structure. All attributes of a specific record are listed under an entity type. common in many other means of storing information. also known as one-to-many relationships. Entity types are related to each other using 1: N mapping. and DOB. It is. The structure allows repeating information using parent/child relationships: each parent can have many children but each child only has one parent.

an example of hierarchical data could be displaying the hierarchy of departmental responsibility or 'who reports to whom'. Consider the following table: Employee_Table EmpNo Designation 10 Director 20 Senior Manager 30 Typist 40 Programmer Their hierarchy stating EmpNo 10 is boss of 20 and 30.Date seventh Edition In Relational Database model.J.40 report to 20 is represented by following table: WhoIsBoss_Table BossEmpNo ReportingEmpNo 10 20 10 30 20 40 .Tree Data structure in Relational Model Chapter 23 'Logic-Based Databases' of An Introduction To Database Systems by C.

In the example above if a person does not report to 2 bosses then the tree of hierarchy is of type 'Achild has only oneparent'. The simple example is the 'Bill Of Material' of Engineering Assembly. A Car Engine could have 2 different assemblies both having similar parts. Now. let us see the hierarchy of 'A child with many parents'. Engine_Assembly Parent_PartNum Child_PartNum 10 30 10 90 20 40 . Consider the following table: Engine_Part_Master PartNum Description 10 CrankAssembly 20 HeadAssembly 30 ConnectingRod 40 Crank Shaft 90 3/4 Dia Bolt Assembly Hierarchy is described in the following table.

the free encyclopedia A data dictionary is a set of metadata that contains definitions and representations of data elements.M IMS MUMPS Caché (software) Metakit Multidimensional hierarchical toolkit Mumps compiler DMSII Data dictionary From Wikipedia. using a data dictionary the format of this telephone number field will be consistent. PartNum 90 has 2 parents.. Representation definitions include how data elements are stored in a computer structure such as an integer... roles and privileges Schema objects Integrity constraints Stored procedures and triggers General database structure Space allocations One benefit of a well-prepared data dictionary is a consistency between data items across different tables. a data dictionary holds the following information:        Precise definition of data elements Usernames. it may include both semantics and representational definitions for data elements. Data dictionaries are one step along a pathway of creating precise semantic definitions for an organization.20 90 .. string or date format (see data type). The semantic components focus on creating precise meaning of data elements.. For example. Amongst other things. several tables may hold telephone numbers. Within the context of a DBMS. as it is present in both assemblies. Some Well-known Hierarchical Databases          Adabas GT. When an organization builds an enterprise-wide data dictionary. . a data dictionary is a read-only set of tables and views.

OS/2 and Linux use this type of tree. B+ tree A simple B+ tree example linking the keys 1-7 to data values d 1-d7. If n is an odd number. All leaves are at the same lowest level. In computer science. . ReiserFS filesystem for Linux. Data dictionaries are more precise than glossaries (terms and definitions) because they frequently have one or more representations of how data is structured. The NTFS filesystem for Microsoft Windows. all data is saved in the leaves. It is a dynamic. Data dictionaries can evolve into full ontology (computer science) when discrete logic has been added to data element definitions. but it must be the same in the whole tree. The maximum number of pointers in a record is called the order of the B+ tree. if the order of a B+ tree is n+1.1)/2. Internal nodes contain only keys and tree pointers. A B+ tree is a variation on a B-tree. Leaf nodes are also linked together as a linked list to make range queries easy. Note the linked list (red) allowing rapid in-order traversal. Data dictionaries are usually separate from data models since data models usually include complex relationships between data elements. data dictionaries are sometimes simply a collection of database columns and the definitions of what the meaning and types the columns contain. In a B+ tree. multilevel index with maximum and minimum bounds on the number of keys in each node. a B+ tree is a type of tree data structure. each node (except for the root) must have between (n+1)/2 and n keys. in contrast to a B-tree. the minimum number of keys can be either (n + 1)/2 or (n . XFS filesystem for IRIX and Linux and JFS2 filesystem for AIX. It represents sorted data in a way that allows for efficient insertion and removal of elements.Initially. The minimum number of keys per record is 1/2 of the maximum number of keys. For example.

Edward M. Similarly. implements the basic Von Neumann computer model used since the 1940s. The B+ tree was first described in the paper "Rudolf Bayer. McCreight: Organization and Maintenance of Large Ordered Indices. storage more commonly refers to mass storage . In contemporary usage. and coupled with a central processing unit (CPU). but of a more permanent nature. Acta Informatica 1: 173-189 (1972)". that of information retention. which has been blurred by the historical usage of the terms "main storage" (and sometimes "primary storage") for random access . 1 GiB of SDRAM mounted in a personal computer Computer storage. and other types of storage which are slower than RAM. they reflect an important and significant technical difference between memory and mass storage devices. As well. It is one of the fundamental components of all modern computers. memory usually refers to a form of solid state storage known as random access memory (RAM) and sometimes other forms of fast but temporary storage. devices and recording media that retain data for some interval of time. computer memory. Please help improve this article by adding citations to reliable sources. For a n-order B+ tree with a height of h:   maximum number of nodes is nh minimum number of keys is 2(n / 2)h − 1. get involved!) This article has been tagged since December 2006. Computer storage provides one of the core functions of the modern computer.optical discs. because they are also fundamental to the architecture of computers in general. Computer storage From Wikipedia. and often casually memory refer to computer components. the free encyclopedia This article or section does not cite its references or sources. An extension of a B+ tree is called a B# Trees which use the B+ Tree Structure and adds further restrictions.The number of keys that may be indexed using a B+ tree is a function of the order of the tree and its height. forms of magnetic storage like hard disks. (help. These contemporary distinctions are helpful.

2 Secondary and off-line storage  1.2 Semiconductor storage  3.3.3 Ability to change information  2. each of which has a value of 1 or 0. Contents  1 Purposes of storage  1.4 Network storage 2 Characteristics of storage  2. and nearly any other form of information can be converted into a string of bits. audio. media player) instead of a computer. and input/output devices. If storage was removed.4 Other early methods  3. calculator.8 Data storage conferences 4 References    Purposes of storage The fundamental components of a general-purpose computer are arithmetic and logic unit. or binary digits.3. equal to 8 bits. a computer with a storage space of eight million bits. control circuitry.memory. . The ability to store instructions that form a computer program.2 Ability to access non-contiguous information  2.3 Optical disc storage  3. The most common unit of storage is the byte.1 Volatility of information  2. A piece of information can be manipulated by any computer whose storage space is large enough to accommodate the corresponding data. storage space. tertiary and off-line storage topics  3.5 Other proposed methods  3.5 Capacity and performance 3 Technologies. numbers. in which the traditional "storage" terms are used as sub-headings for convenience. A digital computer represents information using the binary numeral system.g. or the binary representation of the piece of information. and "secondary storage" for mass storage devices. the device we had would be a simple digital signal processing device (e. or one megabyte.3.1 Magneto-optical disc storage  3. could be used to edit a small novel. and the information that the instructions manipulate is what makes stored program architecture computers versatile.3 Tertiary and database storage  1. This is explained in the following sections. devices and media  3. pictures.1 Primary storage  1.7 Secondary. Text.2 Ultra Density Optical disc storage  3.3 Optical Jukebox storage  3.1 Magnetic storage  3.4 Addressability of information  2. For example.6 Primary storage topics  3.

Registers contain information that the arithmetic and logic unit needs to carry out the current instruction. primary storage typically consists of three kinds of storage:  Processor registers are internal to the central processing unit. each with an individual purpose. So far. As shown in the diagram.Various forms of storage. no practical universal storage medium exists. common technology and capacity found in home computers of 2005 is indicated next to some items. just as in a biological analogy the lungs must be present (for oxygen storage) for the heart to function (to pump and oxygenate the blood). Additionally. . divided according to their distance from the central processing unit. and all forms of storage have some drawbacks. Various forms of storage. have been invented. based on various natural phenomena. Primary storage Primary storage is directly connected to the central processing unit of the computer. It must be present for the CPU to function correctly. as shown in the diagram. Therefore a computer system usually contains several kinds of storage.

(Note that all memory sizes and storage capacities shown in the diagram will inevitably be exceeded with advances in technology over time. yet surprisingly most operating systems continue to implement it.) Secondary and off-line storage Secondary storage requires the computer to use its input/output channels to access the information. although their access speeds are likely to improve with advances in technology. Zip disks and magnetic tapes are commonly used for off-line mass storage purposes. Rotating optical storage devices. the use of virtual memory. Multi-level cache memory is also commonly used—"primary cache" being smallest. In modern computers. Off-line storage devices used in the past include punched cards. microforms. and functioning as electronic "flip-flops". being switching transistors integrated on the CPU's silicon chip. "secondary cache" being larger and slower. The arithmetic and logic unit can very quickly transfer information between a processor register and locations in main storage. That advantage is less relevant today. Main memory contains the programs that are currently being run and the data the programs are operating on. significantly degrades the performance of any computer. Cache memory is a special type of internal memory used by many central processing units to increase their performance or "throughput". but it is also much slower. which is slightly slower but of much greater capacity than the processor registers. as shown in the diagram above. and is used for long-term storage of persistent information. fastest and closest to the processing device. It is directly connected to the CPU via a "memory bus" (shown in the diagram) and a "data bus". or nanoseconds. but still faster and much smaller than main memory. flash memory devices including "USB drives". The memory bus is also called an address bus or front side bus and both busses are high-speed digital "superhighways". This illustrates the very significant speed difference which distinguishes solid-state memory from rotating magnetic storage devices: hard disks are typically about a million times slower than memory.to artificially increase the apparent amount of main memory in the computer. the time taken to access a given byte of information stored in random access memory is measured in thousand-millionths of a second. The main historical advantage of virtual memory was that it was much less expensive than real memory. By contrast. Secondary storage is also known as "mass storage". However most computer operating systems also use secondary storage devices as virtual memory . floppy disks. also known as a "memory addresses". Off-line storage is used for data transfer and archival purposes. Off-line storage is a system where the storage medium can be easily removed from the storage device. In modern computers. or milliseconds. the main memory is the electronic solid-state random access memory. hard disks are usually used for mass storage.  They are technically the fastest of all forms of computer storage. Some of the information in the main memory is duplicated in the cache memory. such as CD and DVD drives. are typically even slower than hard disks. Secondary or mass storage is typically of much greater capacity than primary storage (main memory). DVDs. Therefore. Virtual memory is implemented by many operating systems using terms like swap file or "cache file". despite the significant performance penalties. and removable Winchester disk drums. Access methods and speed are two of the fundamental technical differences between memory and mass storage devices. which is millions of times slower than "real" memory. In modern computers. CDs. and faster but much smaller than main memory. . The time taken to access a given byte of information stored on a hard disk is typically a few thousandths of a second. memory cards. "Hot-pluggable" USB hard disks are also available.

Network computers are computers that do not contain internal secondary storage devices. and to reduce the duplication of information. and is something a typical personal computer user never sees firsthand. Storage area network provides other computers with storage capacity over a network. mainframe computer. etc. If this type of storage is called primary storage. or data vaults. whilst a SAN provides access to disks at block addressing level. all linked together. Volatility of information  Volatile memory requires constant power to maintain the stored information. data banks. or personal computer. See SANS for a fuller description. Characteristics of storage The division to primary. data warehouses. sequential-access storage like tape media. a private widearea network. Databases. Tertiary storage is used in the realms of enterprise storage and scientific computing on large computer systems and business computer networks. secondary. documents and other data are stored on a network-attached storage. Network storage includes:  Network-attached storage is secondary or tertiary storage attached to a computer which another computer can access at file level over a local-area network. or distance from the central processing unit. Volatile memory is typically used only for primary storage. Database storage is a system where information in computers is stored in large databases. tertiary and off-line storage is based on memory hierarchy. then the term secondary storage would refer to offline. Instead. It involves packing and storing large amounts of storage devices throughout a series of shelves in a room. Network storage arguably allows to centralize the information management in an organization. can only be accessed by authorized users.   Confusingly.Tertiary and database storage Tertiary storage is a system where a robotic arm will "mount" (connect) or "dismount" off-line mass storage media (see the next item) according to the computer operating system's demands. leaving it to attaching systems to manage data or file systems within the provided capacity. or in the case of online file storage. over the Internet. and data warehouses. which should properly be called secondary storage. these terms are sometimes used differently. The information in database storage systems can be accessed by a supercomputer. There are also other ways to characterize various types of storage. data banks. (Primary storage is not necessarily . usually an office. the crucial difference between network-attached storage (NAS) and storage area Networks (SAN) is the former presents and manages file systems to client computers. Network storage Network storage is any type of computer storage that involves accessing information over a computer network. Primary storage can be used to refer to local random-access disk storage.

but burdensome for humans.   Capacity and performance . Ability to access non-contiguous information  Random access means that any location in storage can be accessed at any moment in the same. or mutable storage. tertiary and off-line storage use file systems. Sequential access means that the accessing a piece of information will take a varying amount of time. Modern computers typically use read/write storage also for secondary storage. fast read storage is read/write storage which allows information to be overwritten multiple times. amount of time. It is suitable for long-term storage of information. This makes random access memory well suited for primary storage. secondary. and off-line storage. but the operating system of a computer provides the file system abstraction to make the operation more understandable.) Non-volatile memory will retain the stored information even if it is not constantly supplied with electric power. Nonvolatile technologies have been widely used for primary storage in the past and may again be in the future. and therefore used for secondary. locationaddressable storage usually limits to primary storage.   Addressability of information  In location-addressable storage.  volatile. but with the write operation being much slower than the read operation. Immutable storage is used for tertiary and off-line storage. to wait for the correct location in a revolving medium to appear below the read/write head). Examples include CD-RW. Examples include CD-R. In modern computers. with hardware being faster but more expensive option.  Ability to change information  Read/write storage. even though today's most cost-effective primary storage technologies are. A computer without some amount of read/write storage for primary storage purposes would be useless for many tasks. and a particular file is selected with human-readable directory and file names. and write once storage (WORM) allows the information to be written only once at some point after manufacture. accessed internally by computer programs. In modern computers. allows information to be overwritten at any time. since location-addressability is very efficient. or a short identifier with number? pertaining to the memory address the information is stored on. each individually accessible unit of information is selected with a hash value. usually small. The device may need to seek (e. or cycle (e. Read only storage retains the information stored at the time of manufacture. The underlying device is still location-addressable. Content-addressable storage can be implemented using software (computer program) or hardware (computer device). In file system storage. Dynamic memory is volatile memory which also requires that stored information is periodically refreshed. Slow write. depending on which piece of information was accessed last. or read and rewritten without modifications. information is divided into files of variable length.g. These are called immutable storage. tertiary.g. In content-addressable storage. to position the read/write head correctly). each individually accessible unit of information in storage is selected with its numerical memory address.

millisecond for secondary storage. It is the storage capacity of a medium divided with a unit of length. The information is accessed using one or more read/write heads. Semiconductor storage Semiconductor memory uses semiconductor-based integrated circuits to store information. Throughput is the rate at which information can read from or written to the storage. area or volume (e. magnetic storage was also used for primary storage in a form of magnetic drum. devices and media Magnetic storage Magnetic storage uses different patterns of magnetization on a magnetically coated surface to store information. read rate and write rate may need to be differentiated. primary storage almost exclusively consists of dynamic volatile semiconductor memory or dynamic random access memory.    Storage capacity is the total amount of stored information that a storage device or medium can hold. thin film memory. and second for tertiary storage. used for off-line storage  Hard disk. minimum. Also unlike today. It may make sense to separate read latency and write latency. It is expressed as a quantity of bits or bytes (e. Magnetic storage is non-volatile. the magnetic surface will take these forms:  Magnetic disk  Floppy disk. twistor memory or bubble memory. Technologies. A semiconductor memory chip may contain millions of tiny transistors or capacitors. Both volatile and non-volatile forms of semiconductor memory exist. Nonvolatile semiconductor memory is also used for secondary storage in various advanced electronic devices and specialized computers. used for secondary storage Magnetic tape data storage. maximum and average latency. In computer storage. and in case of sequential access storage. 10. magnetic tape was often used for secondary storage. The relevant unit of measurement is typically nanosecond for primary storage.g. core rope memory. Since the read/write head only covers a part of the surface. a type of non-volatile semiconductor memory known as flash memory has steadily gained share as off-line storage for home computers. Optical disc storage Optical disc storage uses tiny pits etched on the surface of a circular disc to store information. cycle or both. Latency is the time it takes to access a particular location in storage. 1. magnetic storage is sequential access and must seek. As with latency. and reads this information by illuminating the surface with a laser diode and observing the . though bit rate may also be used. Storage density refers to the compactness of stored information. or core memory. In modern computers. used for tertiary and off-line storage  In early computers.2 megabytes per square centimeter).4 megabytes).g. Since the turn of the century. In modern computers. throughput is usually expressed in terms of megabytes per second or MB/s.

computer programs) CD-R. fast read storage used for tertiary and off-line storage. DVD+R: Write once storage. The devices are often called optical disk libraries. Jukebox devices may have up to 1.25" ISO cartridge optical disc encased in a dust-proof caddy which can store up to 30 GB of data. sequential access.000 slots for disks. Utilising a design based on a nagneto-optical disc. The information is read optically and written by combining magnetic and optical methods. but utilising phase change technology combined with a blue violet laser. video. slow write. Optical disc storage is non-volatile and sequential access. or autochangers. Magneto-optical disc storage is non-volatile. DVD: Read only storage. CD-ROM. because of the shorter wavelength (405 nm) of the blue-violet laser employed. depending on the space between a disk and the picking device. fast read storage. a blue-violet laser allows more information to be stored digitally in the same amount of space. DVD-RW. DVD-RAM: Slow write. used for tertiary and off-line storage CD-RW. Seek times and transfer rates vary depending upon the optical technology. The following forms are currently in common use:      CD. though up to 500 GB has been speculated as a possibility for UDO. [1] Optical Jukebox storage Optical jukebox storage is a robotic storage device that utilizes optical disk device and can automatically load and unload optical disks and provide terabytes of near-line information.reflection. DVD+RW. Ultra Density Optical disc storage Ultra Density Optical disc storage An Ultra Density Optical disc or UDO is a 5. DVD-R. MOs use a 650nm-wavelength red laser. Current generations of UDO store up to 30 GB. used for tertiary and off-line storage Blu-ray HD DVD The following form have also been proposed:   Holographic Versatile Disc (HVD) Phase-change Dual Magneto-optical disc storage Magneto-optical disc storage is optical disc storage where the magnetic state on a ferromagnetic surface stores information. but 60 GB and 120 GB versions of UDO are in development and are expected to arrive sometime in 2007 and beyond. used for mass distribution of digital information (music. Because its beam width is shorter when burning to a disc than a redlaser for MO. and usually have a picking device that traverses the slots and drives. The arrangement of the slots and picking devices affects performance. Jukeboxes are used in high-capacity archive storage environments . a UDO disc can store substantially more data than a magneto-optical disc or MO. robotic drives.

and reads the information by observing the varying electric resistance of the material. and Selectron tube used a large vacuum tube to store information. and either write once or read/write storage. long before general-purpose computers existed. It might be used for secondary and off-line storage. random access read/write storage. secondary and off-line storage. since Williams tube was unreliable and Selectron tube was expensive. Holographic storage can utilize the whole volume of the storage medium. Delay line memory was dynamic volatile. HSM is a strategy that moves little-used or unused files from fast magnetic storage to optical jukebox devices in a process called migration. sequential access. Williams tube used a cathode ray tube. later. Molecular memory stores information in polymers that can store electric charge. Other early methods Paper tape and punch cards have been used to store information for automatic processing since the 1890s. Other proposed methods Phase-change memory uses different mechanical phases of phase change material to store information. they are migrated back to magnetic disk. and might be used for primary. and was used for primary storage. medical. unlike optical disc storage which is limited to a small number of surface layers. Holographic storage would be non-volatile. Molecular memory might be especially suited for primary storage. optically) sensing whether a particular location on the medium was solid or contained a hole. These primary storage devices were short-lived in the market.such as imaging. If the files are needed. Delay line memory used sound waves in a substance such as mercury to store information. Holographic storage stores information optically inside crystals or photopolymers. and was read by electrically (or. and video. Primary storage topics  Memory management  Virtual memory  Physical memory  Memory allocation  Dynamic memory  Memory leak Memory protection Flash memory Solid state disk Dynamic random access memory     . Information was recorded by punching holes into the paper or cardboard medium. cycle sequential read/write storage. Phase-change memory would be non-volatile.

1 Historical implementations  2. This is one of several typical uses for a flat file database. tertiary and off-line storage topics     List of file formats Wait state Write protection Virtual Tape Library Data storage conferences    Storage Decisions Storage Networking World Storage World Conference Flat file database A simple diagram depicting conversion of a CSV-format flat file database table into a relational database table.2 Contemporary implementations 3 Terms 4 Example database   . Contents   1 Flat files 2 Implementation  2. A flat file database describes any of various means to encode a data model (most commonly a table) as a plain text file. Static random access memory Secondary.

Extra formatting may be needed to avoid delimiter collision. a list of names. in contrast to more complex models such as a relational database. This type of database is routinely encountered. Hollerith's enterprise grew into computer giant IBM. the most typical of which were accounting functions. The data are "flat" as in a sheet of paper. The classic example of a flat file database is a basic name-and-address list. Amusingly enough. 80-column punch card driven database made the early computer a target of attack. and the punched cards which both recorded and stored this data to the US Census Bureau. on a sheet of paper. The rigidity of the fixed-length field. Herman Hollerith conceived the idea that any resident of the United States could be represented by a string of exactly 80 digits and letters—name. primitive electronic computers were run by governments and corporations. age. in essence. and Phone Number. these early applications continued to use Hollerith cards. There are no structural relationships. which led to early relational databases. But many pieces of computer software are designed to implement flat file databases. tabs. thus. these wealthy customers demanded more from their extremely expensive machines. which dominated the market of the time.1 Simple example name index on File1 6 See also  Flat files A flat file generally records one record per line. although often not expressly recognized as a database. where the database consists of a small. Fields may simply have a fixed width with padding. fixed number of fields: Name. addresses. his machines. and a mystery to the common man. Throughout the years following World War II. or may be delimited by whitespace. slightly modified from the original design. so the database fields would "line up" properly. Historical implementations The first uses of computing machines were implementations of simple databases. this is a flat file database. these were very often used to implement flat file databases. Very quickly. consisting of rows and columns. the Census of 1890 was the first ever computerized database— consisting. padded as needed with spaces to make everyone's name the same length. such as payroll. though. 5 Flat-File relational database storage model  5. Implementation It is possible to write out by hand. Another example is a simple HTML table. and phone numbers. . of thousands of boxes full of punched cards. He sold his concept. commas (CSV) or other characters. and so forth. This can also be done with any typewriter or word processor. Address.

and Microsoft's Access started offering some relational capabilities. as well as built-in programming languages. To avoid confusing the reader. XML is now a popular format for storing data in plain text files. it would be incorrect to describe this type of database as conforming to the flat-file model. or more broadly. products like Borland's Paradox. the term refers to any database which exists in a single file in the form of rows and columns. the basic terms "record" and "field" are used in nearly every database implementation. More broadly. but the concept is the same. This function is implemented in Microsoft Works (available only for some versions of Windows) and AppleWorks. Contemporary implementations Today. FileMaker uses the term "Find". but as XML allows very complex nested data structures to be represented and contains the definition of the data. However. configurable flat-file database computer applications were popular on DOS and the Macintosh. FileMaker "files" are equivalent to MySQL "tables". The narrower interpretation is correct in database theory. Example database . the broader covers the term as generally used. with no relationships or links between records and fields except the table structure. and were almost on par with word processors and spreadsheets in popularity. allowing some data to be shared between files. there are few programs designed to allow novices to create and use general-purpose flat file databases. Terms "Flat file database" may be defined very narrowly. Strictly. a flat file database should consist of nothing but data and delimiters. Some of these offered limited relational capabilities. but the concepts remain the same.In the 1980s. Flat file databases are still used internally by many computer applications to store configuration data. Some small "contact" (name-and-address) database implementations essentially use flat files. Over time. Examples of flat-file database products were early versions of FileMaker and the shareware PCFile. These programs were designed to make it easy for individuals to design and use their own databases. while MySQL uses the term "Query". Terms used to describe different aspects of a database and its tools differ from one implementation to the next. Examples are programs to manage collections of books or appointments. sometimes named ClarisWorks (available for both Macintosh and Windows platforms). one consistent set of terms is used throughout this article. Database Management Systems (DBMS) like MySQL or Oracle generally require programmers to build applications. Many applications allow users to store and retrieve their own information from flat files using a pre-defined set of fields. and so forth.

the data in the second column is "all the same"—names."Reds" "3". the same effect is achieved by delimiting fields with a tab character. say."Chuck". the word "field" refers to just one datum within the file—the intersection of a field and a record. and the team they support."Fred". the third line from the top "belongs to" one person: Bob. (Sometimes. and some belong to the Reds and some to the Blues. The data—the information itself—has simply been written out in table form: id 1 2 3 4 5 6 7 8 name team Amy Blues Bob Reds Chuck Blues Dick Blues Ethel Reds Fred Blues Gilly Blues Hank Reds Note that the data in the first column is "all the same"—that is. fields can be detected by the fact that they all "line up": each datum uses up the same number of characters as all other data in the same column. in which case the question is left open: "What is in these fields?" The answer must be supplied elsewhere. and his team is the "Reds"."Reds" —which is "comma-separated" (CSV) format. These columns are called "fields"."Blues" "2". although this is not strictly correct. storing a person's name. Today. a numeric ID."Gilly"."Dick"."Reds" "6". Some databases omit this. We could also write: 1-Amy-Blues/2-Bob-Reds/3-Chuck-Blues/4-Dick-Blues/5-Ethel-Reds/6-Fred-Blues/7-Gilly-Blues/8-Hank-Reds/ . Also note that all the information in. Bob's id is "2"."Blues" "4". but a row of "field labels"—names that identify the contents of the fields which they head."Hank". his name is "Bob" (no surprise!). records are delimited by a newline."Ethel". Each line is called a "record"."Blues" "5". We have decided Pico users will gang up into teams. This is "tab-separated" format."Amy". extra spaces are added to make them all the same length.) The first line is not a record at all. Other ways to implement the same database are: "1". In this implementation. Likewise.Consider a simple example database. This is a very primitive and brittle implementation. dating back to the days of punch cards."Blues" "8"."Bob". All these team designations are found in the third column only. they are all id numbers (serial numbers)."Blues" "7".

For instance. We can import the entire database into another tool. via an 'insert' or equivalent command. Flat-File relational database storage model There is a difference between the concept of a single flat-file database as a flat database model as used above. extending the structure. and. a database management system. but only for the most basic needs. if we can edit it at all. We can add additional records to the file explicitly. Sometimes this is enough. . do a textual search for specific fields. delete. We can choose to control what kind of data may be stored in a given field. File2 team arena Blues le Grand Bleu Reds Super Smirnoff Stadium In this setting flat-files simply act as a data store of a modern relational-database. or edit records or individual units of data. id is defined to hold only a serial number. We can look at it. There is not much we can do with such a simple database. and multiple flat-file tables as a relational database model. we can add new records to it. For more advanced applications. we can edit the contents of any field. depending on the storage format. too. rather it is only there for clarification. We can add additional fields. File1 file-offset 0x00 0x13 0x27 0x3B 0x4F 0x62 0x76 0x8A id 8 1 3 4 5 7 6 2 name team Hank Reds Amy Blues Chuck Blues Dick Blues Ethel Reds Gilly Blues Fred Blues Bob Reds The file-offset isn't actually part of the database. we can define certain processes to take place when this happens. In order for flat-files to be part of a relational database the RDBMS must be able to recognize various foreign key relationships between multiple flat-file tables. Beyond those. all that is needed to be a modern database are separate files supplied by the RDBMS for storing indexes. An advantage of a database tool is that it is specifically designed for database management.All are equivalent databases. we turn to a tool designed for the task. This is about the limit of what a simple flat file can do. which is assigned automatically when a new record is created. relational databases are usually used. We can add.

not simply a pointer to data that resides elsewhere. triggers. only one clustered index can exist in a given table (whereas many non-clustered indexes can exist. limited by the particular RDBMS vendor). the leaf node of the clustered index corresponds to the actual data. For example. Indexes can be created using one or more columns. Some databases extend the power of indexes even further by allowing indexes to be created on functions or expressions. foreign key relationships. . In a relational database an index is a copy of part of a table. The disk space required to store the index is typically less than the storage of the table[citation needed]. and therefore. which would only store the uppercase versions of the last_name field in the index[citation needed]. Clustered indexes are indexes that are built based on the same key by which the data is ordered on disk. as is the case with a non-clustered index. and other modern distributed relational database concepts. the data in the table is sorted as per the index. A unique index acts as a constraint on the table by preventing identical rows in the index and thus.constraints. Simple example name index on File1 0x00000013 0x0000008A 0x00000027 0x0000003B 0x0000004F 0x00000076 0x00000062 0x00000000 Index (database) A database index is a data structure that improves the speed of operations in a table. Indexes are defined as unique or non-unique. the original columns. Due to the fact that the clustered index corresponds (at the leaf level) to the actual data.[citation needed] In some relational database management systems such as Microsoft SQL Server. replication plans. fragmentation plans. Contents • • • • 1 Architecture 2 Column order 3 Applications and limitations 4 See also Architecture Index architectures are classified as clustered or non-clustered. an index could be created on upper(last_name).

This is because the index is built with the assumption that words go from left to right. but even if the email_address column has been indexed the database still must perform a full table scan. then by last name. B+ trees and hashes. Indexes can be implemented using a variety of data structures. However. 'Intrinsic' might be a better adjective than 'clustered' -. To process this statement without an index the database software must look at the last_name column on every row in the table (this is known as a full table scan). Consider the following SQL statement: SELECT first_name FROM people WHERE last_name = 'Finkelstein'. and then by first name..com'). you can easily extract the list of all phone numbers for that city. Some databases can do this...com".Unclustered indexes are indexes that are built on any key. Column order The order in which columns are listed in the index definition is important. This problem can be solved through the addition of another index created on reverse(email_address) and a SQL query like this: select email_address from customers where reverse(email_address) like reverse('%@yahoo..indicating that the index is an integral part of the data structure storing the table. this is much less computationally expensive than a full table scan. imagine a phone book that is organized by city first. Consider this SQL statement: SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo. Each relation can have a single clustered index and many unclustered indexes. With a wildcard at the beginning of the search-term the database software is unable to use the underlying b-tree data structure.oohay@%) which the index on reverse(email_address) can satisfy.com'. in this phone book it would be very tedious to find all the phone numbers for a given last name. With an index the database simply follows the b-tree data structure until the Finkelstein entry has been found..[citation needed] Unclustered indexes are forced to store only record IDs in the data structure and require at least one additional i/o operation to retrieve the actual record. For example.[citation needed]. However. This puts the wild-card at the right most part of the query (now moc. It is possible to retrieve a set of row identifiers using only the first indexed column. If given the city. it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column. others just won’t use the index. You would have to look within each city's section for the entries with that last name. Clustered indexes usually store the actual records within the data structure and as a result can be much faster than unclustered indexes. File system . Applications and limitations Indexes are useful for many applications but come with some limitations. Popular indices include balanced trees. This query would yield an email address for every customer whose email address ends with "@yahoo.

navigation. usually by connecting the file name to an index into a file allocation table of some sort.5 File systems under OpenVMS 3. Directory structures may be flat.3 File systems under Plan 9 from Bell Labs 3. or 9P clients). or they may be virtual and exist only as an access method for virtual data (e. NFS. procfs). and retrieval of data. A file system can be used to organize and represent access to any data. and keeping track of which sectors belong to which file and which are not being used.2.4 File systems under Microsoft Windows 3.5 Special purpose file systems 3. sometimes called sectors. a file system (often also written as filesystem) is a method for storing and organizing computer files and the data they contain to make it easy to find and access them.. or allow hierarchies where directories .1 Flat file systems 3. File systems share much in common with database technology. More formally. The file system software is responsible for organizing these sectors into files and directories. they might provide access to data on a file server by acting as clients for a network protocol (e.2 File systems under Unix and Unix-like systems 3 File systems and operating systems  3. File systems may use a storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files. However. file systems typically have directories which associate file names with files.g. Contents   1 Aspects of file systems 2 Types of file systems               2. whether it be stored or dynamically generated (eg. but it is debatable whether a file system can be classified as a special-purpose database (DBMS).6 File systems under MVS [IBM Mainframe] 4 See also 5 References   5. such as the FAT in an MS-DOS file system. file systems need not make use of a storage device at all.1 File systems under Mac OS X 3.1 Disk file systems 2.3 Transactional file systems 2.2 Database file systems 2. generally 512 bytes each. manipulation.g. Whether the file system has an underlying storage device or not. hierarchical organization.1 Further reading 6 External links Aspects of file systems The most familiar file systems make use of an underlying data storage device that offers access to an array of fixed-size blocks. access. a file system is a set of abstract data types that are implemented for the storage. or an inode in a Unix-like file system.4 Network file systems 2.In computing. from a network connection). SMB.

. In some file systems.). such as Plan 9 and Inferno. its owner user-ID and group-ID. etc. HFS and HFS+. the time it was last accessed.) Other information can include the file's device type (e. the character encoding of a plain-text document. (Note that many early PC operating systems did not keep track of file times. This. In others. Some file systems also store the file creation time. and UDF. rename parent links (".. previous implementations were restricted to only a few levels. and create bidirectional links to files. Research has shown access control lists to be difficult to secure properly. ext3. subdirectory. The operations provided are highly asymmetric and lack the generality to be useful in unexpected contexts.. executable. file names are structured. Other bookkeeping information is typically associated with each file within a file system. Examples of disk file systems include FAT. Commercial file systems still use access control lists. Traditional file systems also offer facilities to truncate. for example. socket. block. move and delete both files and directories. which is why research operating systems tend to use capabilities. move. Ritchie extended the file system concept to every object in his later operating system developments. and the time that the file's meta-data was changed. ext2/ext3.). Database file systems . file names are simple strings. and its access permission settings (e. The time that the file was last modified may be stored as the file's timestamp. After the success of Unix. They lack facilities to create additional links to a directory (hard links in Unix). Some disk file systems are journaling file systems or versioning file systems.g. with special syntax for filename extensions and version numbers. They do not offer facilities to prepend to or truncate from the beginning of a file. and per-file metadata is stored elsewhere. Disk file systems A disk file system is a file system designed for the storage of files on a data storage device. append to. Secure access to basic file system operations can be based on a scheme of access control lists or capabilities. which might be directly or indirectly connected to the computer. interprocess pipes in Unix have to be implemented outside of the file system because the pipes concept does not offer truncation from the beginning of files. such as XFS. NTFS. most commonly a disk drive. This feature is implemented in the kernels of Linux." in Unix-like OS). Traditional file systems offer facilities to create.may contain subdirectories. notably the IBM implementations. ODS-5. create. ISO 9660. ext2. see: secure computing Arbitrary attributes can be associated on advanced file systems. etc. let alone arbitrary insertion into or deletion from a file.g. whether the file is read-only. and allows metadata to be associated with the file at the file system level. FreeBSD and Mac OS X operating systems. could be the author of a document. For example. or a checksum. and HFS+. some versions of UFS. character. The length of the data contained in a file may be stored as the number of blocks allocated for the file or as an exact byte count. using extended file attributes. network file systems and special purpose file systems. The hierarchical file system was an early research interest of Dennis Ritchie of Unix fame. even of their early databases like IMS. Types of file systems File system types can be classified into disk file systems. delete and inplace modify files.

hierarchical structured management. Each operation that you do may involve changes to a number of different files and disk structures. for example. This type of file system is designed and intended to be fault tolerant and necessarily. important in this case because they are implemented in flash memory. meaning that it is important that they all be executed at the same time. files are identified by their characteristics. native. This includes systems where the files are arranged dynamically by software. which might be called "CP/M file system" if needed. The Mars Rovers are one such example of an RTOS file system.A new concept for file management is the concept of a database-based file system. AFP. the metaphor of the . Instead of. such as the Unix shell. Examples include the procfs (/proc) file system used by some Unix variants. there needs to be an interface provided by the operating system software between the user and the file system. as a file system is an integral part of any modern operating system. like Voyager I & II used digital tape based special file systems. Most modern space exploration craft like Cassini-Huygens used Real-time operating system file systems or RTOS influenced file systems. then on reset. Deep space science exploration craft. Special purpose file systems A special purpose file system is basically any file system that is not a disk file system or network file system. In many cases. these changes are related. which grants access to information about processes and other operating system features. Special purpose file systems are most commonly used by file-centric operating systems such as Unix. CP/M supports only its own file system. On early operating systems. If for some reason the computer crashes before it has had a chance to update its own records. as well. unnamed file system. or similar metadata. or in addition to. author. incurs a high degree of overhead. there will be no record of the transfer but the bank will be missing some money. topic. The bank's computer will "send" the transfer instruction to the other bank and also update its own records to indicate the transfer has occurred. the disk operating system was loaded separately from the rest of the operating system. On some microcomputers. Examples of network file systems include clients for the NFS. This interface can be textual (such as provided by a command line interface. Take for example a bank sending another bank some money electronically. Because of this. providing a complete record of what was done and where. Transactional file systems This is a special kind of file system in that it logs events or transactions to files. but which didn't bear any official name at all. File systems and operating systems Most operating systems provide a file system. Network file systems A "network file system" is a file system that acts as a client for a remote file access protocol. and 9P protocols. like type of file. Some early operating systems had a separate component for handling file systems which was called a disk operating system. All transactions can be saved. providing access to files on a server. If graphical. there was usually support for only one. or OpenVMS DCL) or graphical (such as provided by a graphical user interface. SMB. Early microcomputer operating systems' only real task was file management — a fact reflected in their names (see DOS and QDOS). and file-system-like clients for FTP and WebDAV. A transactional system can rebuild the actions by resynchronizing the "transactions" on both ends to correct the failure. such as file browsers). intended for such purposes as communication between computer processes or temporary file space.

There are some utilities for Unix-like systems that allow the mounting of predefined file systems upon demand. be it a hard disk. File systems under Unix and Unix-like systems Wikibooks Guide to Unix has a page on the topic of Filesystems and Swap Unix and Unix-like operating systems assign a device name to each device. called Macintosh File System. and makes it difficult for users to organise data into related groups. you must first inform the operating system where in the directory tree you would like those files to appear. . the original Apple Macintosh featured a flat file system. one must tell the operating system "Take the file system from this CD-ROM and make it appear under thus-and-such a directory".it might. Unix-like operating systems often include software and tools that assist in the mounting process and provide it new functionality.it might not even be on your computer. In some situations. floppy disk. Instead. In many situations. and every file existing on the system is located under it somewhere. For example. root user) may authorize the mounting of file systems. It may be empty. for example. there is one root directory. there are no directories — everything is stored at the same (root) level on the media. in Unix. to access the files on a CD-ROM. The /mnt directory exists on many Unix-like systems (as specified in the Filesystem Hierarchy Standard) and is intended specifically for use as a mount point for temporary media like floppy disks or CDs. 2. containing documents. All Unix-like systems therefore provide a facility for mounting file systems at boot time. 3. 1. which makes all the files on all the devices appear to exist under one hierarchy. Flat file systems In a flat file system.e. or it may contain subdirectories for mounting individual devices. MFS was quickly replaced with Hierarchical File System. Unix creates a virtual file system. Furthermore. there is no need to mount certain file systems at boot time. The directory given to the operating system is called the mount point . which supported real directories. Generally.folder. This process is called mounting a file system. etc. which also indicates options and mount points. To gain access to files on another device. System administrators define these file systems in the configuration file fstab. They allow programs and data to be transferred between machines without a physical connection. the Unix root directory does not have to be in any physical place. This means. and nested folders is often used (see also: directory and folder). Some of these strategies have been coined "auto-mounting" as a reflection of their purpose. only the administrator (i. but this is not how the files on that device are accessed. Removable media have become very common with microcomputer platforms. other files. While simple. Unix can use a network shared resource as its root directory. be /mnt. this system rapidly becomes inefficient as the number of files grows. file systems other than the root need to be available as soon as the operating system has booted. Like many small systems before it. It might not be on your first hard drive . although their use may be desired thereafter. Its version of Mac OS was unusual in that the file management software (Macintosh Finder) created the illusion of a partially hierarchical filing system on top of MFS.

Perhaps surprisingly. as would be appropriate for removable media. 5. the filetype can come from the type code stored in file's metadata or the filename. see. rather than relying on events such as the insertion of media. File systems under Mac OS X Mac OS X uses a file system that it inherited from Mac OS called HFS Plus. represented by files. Progressive Unix-like systems have also introduced a concept called supermounting. Utilities have therefore been developed to detect the presence and availability of a medium and then mount that medium without any user intervention. like supermounting. HFS Plus has three kinds of links: Unix-style hard links. no ioctl or mmap). eliminates the need for manual mounting commands. a floppy disk that has been supermounted can be physically removed from the system. This has the advantage that a device or devices. hard links and suid are made obsolete. is that devices are mounted transparently when requests to their file systems are made. This means that under Plan 9. Due to the Unix roots of Mac OS X. Provided synchronisation has occurred. File systems under Plan 9 from Bell Labs Plan 9 from Bell Labs was originally designed to extend some of Unix's good points.4.. Similar functionality is found on standard Windows machines. More importantly the set of file operations becomes well defined and subversions of this like ioctl are eliminated. they are not interpreted by the file system itself. The system automatically notices that the disk has changed and updates the mount point contents to reflect the new medium. the Unix system of treating things as files was continued. On Mac OS X. as would be appropriate for file systems on network servers. the underlying 9P protocol was used to remove the difference between local and remote files (except for a possible difference in latency). Two common examples include CD-ROMs and DVDs. Secondly. but in Plan 9. HFS Plus uses Unicode to store filenames. Under normal circumstances. Filenames can be up to 255 characters. everything is treated as a file. other than compatibility in an apparent greater range of applications such as access to file systems on network servers. . for example. Aliases are designed to maintain a link to their original file even if they are moved or renamed. HFS Plus is a metadata-rich and case preserving file system. the disk should have been synchronised and then unmounted before its removal. and an atomic create/open operation is introduced. for example symlinks. For example. a different disk can be inserted into the drive. classing them as file systems. With respect to file systems. Unix permissions were added to HFS Plus. a system that. The difference from supermount. Servers for "synthetic" file systems can also run in user space bringing many of the advantages of micro kernel systems while maintaining the simplicity of the system. and to introduce some new ideas of its own while fixing the shortcomings of Unix. while the file interface is made universal it is also simplified considerably. Later versions of HFS Plus added journaling to prevent corruption of the file system structure and introduced a number of optimizations to the allocation algorithms in an attempt to defragment files automatically without requiring an external defragmenter. but by the File Manager code in userland. and accessed as a file would be (ie. Unix-style symbolic links and aliases. the Linux supermount-ng project. on a remote computer could be used as though it were the local computer's own device(s). A similar innovation preferred by some users is the use of autofs. multiple file servers provide access to devices.

File systems under Microsoft Windows Microsoft Windows developed out of an earlier operating system (MS-DOS which in turn was based on QDOS and that on CP/M-80. As such. notably several from DEC). allowing each process to have a different view of the many file systems that provide resources in a distributed system. Network drives may also be mapped to drive letters. capabilities. FAT12 and FAT16 had a limitation of 8 characters for the file name. and is represented graphically with a folder icon. the path C:\WINDOWS\ represents a directory WINDOWS on the partition represented by the letter C. Unix). With ftpfs as an intermediary. The tradition of using "C" for the drive letter can be traced to MS-DOS.g. and other services are accessed via I-O operations on file descriptors. The wikifs provides a file system interface to a wiki. though not all these features are well-documented. compression and mount-points for other file systems (called "junctions") are also supported. Another example: a Plan-9 application receives FTP service by opening an FTP site. For example. (This is commonly referred to as the 8. the application can now use the usual file-system operations to access the FTP site as if it were part of the local file system. The C drive is most commonly used for the primary hard disk partition. A would be the 3½inch floppy drive. quota tracking. on which Windows is installed and from which it boots.) VFAT. Older versions of the FAT file system (FAT12 and FAT16) had file name length limits. Since Windows primarily interacts with the user via a graphical user interface. debugging. its documentation refers to directories as a folder which contains files. Specifically. introduced with the Windows NT operating system. For example. but remains limited compared to NTFS. and B the 5¼-inch one. Windows makes use of the FAT (File Allocation Table) and NTFS (New Technology File System) file systems. a limit on the number of entries in the root directory of the file system and had restrictions on the maximum size of FAT-formatted disks or partitions. which took many ideas from still earlier operating systems. Hard links. multiple file streams. which was an extension to FAT12 and FAT16 introduced in Windows NT 3. allowed ACL-based permission control. where the letters A and B were reserved for up to two floppy disk drives. encryption. graphics. These file systems are organized with the help of private. authentication.3 limit. NTFS. and has added file systems from several other sources since its first release (e. Database transaction . in a common configuration. This "tradition" has become so firmly ingrained that bugs came about in older versions of Windows which made assumptions that the drive that the operating system was installed on was C. Windows uses a drive letter abstraction at the user level to distinguish one disk or partition from another. allowed for long file names (LFN).Everything on a Plan 9 system has an abstraction as a file. or provides a network-transparent window system without the need of any extra code. per-process namespaces. A further example is the mail system which uses file servers that synthesize virtual files and directories to represent a user mailbox as /mail/fs/mbox. networking. The Inferno operating system shares these concepts with Plan 9. The ftpfs server handles the open by essentially mounting the remote FTP site as part of the local file system. this allows the use of the IP stack of a gateway machine without need of NAT. FAT32 also addressed many of the limits in FAT12 and FAT16. and 3 characters for the extension. attribute indexing. Unlike many other operating systems.5 and subsequently included in Windows 95.

see ACID. A single transaction might require several queries. Transactional databases Databases that support transactions are called transactional databases. each reading and/or writing information in the database. Ideally. In some systems. transactions are also called LUWs for Logical Units of Work. This behaviour is dependent on the DBMS in use and how it is set up.A database transaction is a unit of interaction with a database management system or similar system that is treated in a coherent and reliable way independent of other transactions that must be either entirely completed or aborted. Execute several queries (although any updates to the database aren't actually visible to the outside world yet) 3. if the money was debited from one account. Begin the transaction 2. Most modern relational database management systems fall into this category. The transaction can also be rolled back manually at any time before the commit. these properties are often relaxed somewhat to provide better performance. When this happens it is usually important to be sure that the database is not left with only some of the queries carried out. For more information about desirable transaction properties. when doing a money transfer. Commit the transaction (updates become visible if the transaction is successful) If one of the queries fails the database system may rollback either the entire transaction or just the failed query. a database system will guarantee all of the ACID properties for each transaction. For example. In practice. Also. Contents      1 Purpose of transaction 2 Transactional databases 3 Transactional filesystems 4 See also 5 External links Purpose of transaction In database products the ability to handle transactions allows the user to ensure that integrity of a database is maintained. transactions should not interfere with each other. Transactional filesystems . it is important that it also be credited to the depositing account. A simple transaction is usually issued to the database system in a language like SQL in this form: 1.

See also       Distributed transaction Nested transaction ACID properties Atomic transaction Software transactional memory Long running transaction Transaction processing In computer science. Each transaction must succeed or fail as a complete unit.3 Deadlocks 3 ACID criteria 4 Implementations 5 See also 6 Books     Description Transaction processing is designed to maintain a database in a known. transaction processing is information processing that is divided into individual. but it involves at least two separate operations in computer terms: debiting the savings account by £500. This transaction is a single operation in the eyes of the bank. the books of the bank will not balance at the end of the day. Contents   1 Description 2 Methodology  2. indivisible operations. by ensuring that any operations carried out on the database that are interdependent are either all completed successfully or all cancelled successfully. consistent state. If the debit operation succeeds but the credit does not (or vice versa).2 Rollforward  2. . Transaction processing is designed to provide this. called transactions. and crediting the checking account by £500.1 Rollback  2.The Namesys Reiser4 filesystem for Linux [1] and the newest version of the Microsoft NTFS filesystem both support transactions [2]. consider a typical banking transaction that involves moving £500 from a customer's savings account to a customer's checking account. There must therefore be a way to ensure that either both operations succeed or both fail. For example. it cannot remain in an intermediate state. so that there is never any inconsistency in the bank's database as a whole.

e. and all changes to the database are made permanent. Transaction processing guards against hardware and software errors that might leave a transaction partially completed. indivisible transaction. If some of the operations are completed but errors occur when the others are attempted. Rollback Transaction-processing systems ensure database integrity by recording intermediate states of the database as it is modified. The back-up will not reflect transactions committed since the back-up was made. Rollforward It is also possible to keep a separate journal of all modifications to a database (sometimes called after images). there can be no “holes” in the sequence of preceding transactions. or none of them are. Before any transaction is committed. with a database left in an unknown. If all operations of a transaction are completed successfully. copies of information on the database prior to its modification by a transaction are set aside by the system before the transaction can make any modifications (this is sometimes called a before image). Transactions are processed in a strict chronological order. the terminology may vary from one transaction-processing system to another. known state that it was in before processing of the transaction began. the transactionprocessing system “rolls back” all of the operations of the transaction (including the successful ones). then using these records to restore the database to a known state if a transaction cannot be committed. If any part of the transaction fails before it is committed. inconsistent state. the transaction cannot be rolled back once this is done. Methodology The basic principles of all transaction-processing systems are the same. and the terms used below are not necessarily universal. the transaction processing system guarantees that all operations in any uncommitted (i. not completely processed) transactions are cancelled. the transaction is “committed” by the system. transaction n+1 does not begin until transaction n is committed. it must be restored from the most recent back-up. If the computer system crashes in the middle of a transaction. The transaction-processing system ensures that either all operations in a transaction are completed without error.. However. so some transaction-processing systems provide it. all other transactions affecting the same part of the database must also be committed. For example. but it is useful for updating the database in the event of a database failure. If the database fails entirely. once the database is . thereby erasing all traces of the transaction and restoring the database to the consistent. this is not required for rollback of failed transactions.Transaction processing allows multiple individual operations on a database to be linked together automatically as a single. However. If transaction n+1 touches the same portion of the database as transaction n. these copies are used to restore the database to the state it was in before the transaction began (rollback).

automatically. a deadlock occurs. All transactionprocessing systems support these functions. in a way that prevents them from proceeding. at that point. including mainframes. in more recent years. was first developed in the 1960s. However. in which two or more network hosts are involved. consistency. two transactions may. An important open industry standard is the X/Open Distributed Transaction Processing (DTP) (see JTA). but the basic principles remain the same. in the course of their processing. If.restored. However. and neither transaction can move forward. a single distributed database was not a practical solution. which are referred to as the ACID properties: atomicity. attempt to access the same portion of a database at the same time. known state that includes the results of all transactions committed up to the moment of failure. transaction A then tries to access portion Y of the database while transaction B tries to access portion X. Today a number of transaction processing systems are available that work at the inter-program level and which scale to large systems. and transaction B may access portion Y of the database. The result is a database in a consistent. isolation. Implementations Standard transaction-processing software. the distributed client-server model has become considerably more difficult to maintain. hosts provide transactional resources. In addition. proprietary transaction Distributed transaction A distributed transaction is an operations bundle. For example. as opposed to a strict client-server model where the single server could handle the transaction processing. Transaction-processing systems are designed to detect these deadlocks when they occur. notably IBM's Information Management System. the journal of after images can be applied to the database (rollforward) to bring the database up to date. As the number of transactions grew in response to various online services (especially the Web). while the transaction manager is . Usually. most online systems consist of a whole suite of programs operating together. transaction A may access portion X of the database. and durability. ACID criteria There are many minor variations on the exact methods used to protect database consistency in a transaction-processing system. and then they will be started again in a different order. so that the deadlock doesn't occur again. Typically both transactions will be cancelled and rolled back. Deadlocks In some cases. Any transactions in progress at the time of the failure can then be rolled back. Client-server computing implemented similar principles in the 1980s with mixed success. and was often closely coupled to particular database management systems.

Open Group. If a transaction T wants to read/write an object. it will lock the resources for this long. two-phase commit is not applicable here. a rental car and a hotel. a system can be designed to undo certain operations (unless they are irreversibly finished). (see also commitment ordering for multi databases. The way you can undo the hotel booking by calling a desk and cancelling the reservation. The isolation property poses a special challenge for multi database transactions. Databases are common transactional resources and. must have all four ACID properties. There are also long-lived distributed transactions. even if each database provides it. In practice most commercial database systems use strict two-phase locking for concurrency control.responsible for creating and managing global transaction that encompasses all operations against such resources. which consists of booking a flight. Usually these transactions utilize principles of Compensating Transactions. A couple of modern technologies. X/Open standard does not cover long-lived DTP. In this case. as any other transactions. which ensures global serializability. for example a transaction to book a trip. long-lived distributed transactions are implemented in systems based on Web Services. The two rules of Strict 2PL are: 1. if all the participating databases employ it. which became a de-facto standard for behavior of transaction model components. transactions span a couple of such databases. including Enterprise Java Beans (EJBs) and Microsoft Transaction Server (MTS) fully support distributed transaction standards. In this case more sophisticated techniques that involve multiple undo levels are used. All exclusive locks held by transaction T are released when T commits (and not before). strict two-phase locking (Strict 2PL) is a locking method used in concurrent systems. ranging from couple of milliseconds to couple of minutes. it must request a shared/exclusive lock on 2. where atomicity guarantees all-or-nothing outcomes for the unit of work (operations bundle). often. proposed the X/Open Distributed Transaction Processing (DTP) Model. a vendor consortium. . Strict two-phase locking In computer science. a distributed transaction can be seen as a database transaction that must be synchronized (or provide ACID properties) among multiple participating databases which are distributed among different physical locations. since the (global) serializability property could be violated. In practice. This algorithm is usually applied for updates able to commit in a short period of time.) A common algorithm for ensuring correct completion of a distributed transaction is the two-phase commit. Distributed transactions. the object. Since booking the flight might take up to a day to get a confirmation. Optimism and Isolation Without Locking.

Commit where • • • • S(O) is a shared lock action on an object O X(O) is an exclusive lock action on an object O R(O) is a read action on an object O W(O) is a write action on an object O Strict 2PL prevents transactions reading uncommitted data. W(C).Here is an example of Strict 2PL in action with interleaved actions. [edit] Strict 2PL does not guarantee a deadlock-free schedule Avoiding deadlocks can be important in real time systems. T1: X(C). or in text form: T1: S(A). X(B). Commit. overwriting uncommitted data. R(B). W(B). since eXclusive locks (for write privileges) must be held until a transaction commits. T2: S(A). it prevents cascading rollbacks. and unrepeatable reads. R(A). A deadlocked schedule allowed under Strict 2PL: . Thus. and may additionally be difficult to enforce in distributed data bases. R(C). R(A). or fault tolerant systems with multiple redundancy.

while getting those results as quickly as possible. Several algorithms can be used for either type of concurrency control (i. These transactions cannot proceed and both are deadlocked.e. Contents [hide] • • 1 Transaction ACID rules 2 Concurrency control mechanism . without data loss). which must ensure that transactions are executed safely and that they follow the ACID rules. or with on-disk databases). as described in the following section. and that no actions of committed transactions are lost while undoing aborted transactions. with in-RAM data structures on systems that have no database. Even more strict than strict two-phase locking is rigorous two-phase locking. The DBMS must be able to ensure that only serializable.. while T2 is waiting for T1's lock on A to be released.Text: T1: X(A) T2:X(B) T1:X(B) T2: X(A) T1 is waiting for T2's lock on B to be released.. recoverable schedules are allowed. Concurrency control is especially applicable to database management systems (DBMS). Most database systems use strict 2PL. in which transactions can be serialized by the order in which they commit.e. in the field of databases and database theory — concurrency control is a method used to ensure that database transactions are executed in a safe manner (i. so they must be anticipated and dealt with accordingly. Concurrency control In computer science — more specifically. Nonetheless. In computer science — in the field of concurrent programming (see also parallel programming and parallel computing on multiprocessor machines) — concurrency control is a method used to ensure that correct results are generated. several solutions such as the Banker's algorithm or the imposition of a partial ordering on lock acquisition exist for avoiding deadlocks under certain conditions. all locks (shared and exclusive) must be held until a transaction commits. There is no general solution to the problem of deadlocks in computing systems. Under rigorous 2PL.

committing). from a set of transactions. Pessimistic . however.Either all or no operations are completed . there are many academic texts encouraging view serializable schedules for environments where gains due to improvement in concurrency outstrip overheads in generating schedule plans.Transactions cannot interfere with each other. Blocking is thus more likely but will be known earlier.e. There are many methods for concurrency control. All the currently implemented lock-based and almost all the implemented non-lock based concurrency controls will guarantee that the resultant schedule is conflict serializable. Here is a sample schedule: .The potentially concurrent executions of transactions are synchronized early in their execution life cycle. aborting. a schedule is a list of actions.All transactions must leave the database in a consistent state. reading.Delay the synchronization for transactions until the operations are performed. There are also non-lock concurrency control methods. (Redo) [edit] Concurrency control mechanism The main categories of concurrency control mechanisms are: • • Optimistic . the majority of which uses Strict 2PL locking: • • • • • Strict two-phase locking Non-strict two-phase locking Conservative two-phase locking Index locking Multiple granularity locking Locks are bookkeeping objects associated with a database object.Successful transactions must persist through crashes. Isolation . Schedule (computer science) \In the field of databases. (i. writing. Durability .in other words to the outside world the transaction appears to happen indivisibly (Undo) Consistency .• • 3 See also 4 External links [edit] Transaction ACID rules • • • • Atomicity . Conflicts are less likely but won't be known until they happen.

2 Avoids cascading aborts (rollbacks) 1.2. T1 Reads and writes to object X.1 Conflicting actions 1. and then T2 Reads and writes to object Y.3.2 Serializable o 1.3 Recoverable          1.2. This is an example of a serial schedule. because the actions of the 3 transactions are not interleaved.2. (see above) Serializable .5 View equivalence 1.3 Conflict-serializable 1.1 Serial o 1. T2.2 Conflict equivalence 1. Contents • 1 Types of schedule o 1. non-interleaved. Schedule D is the set of 3 transactions T1.2.3. T3. and finally T3 Reads and writes to object Z.2.6 View-serializable 1.2.3.1 Unrecoverable 1. The schedule describes the actions of the transactions as seen by the DBMS.In this example.3 Strict • • • 2 Hierarchical relationship between serializability classes 3 Practical implementations 4 See also Types of schedule Serial The transactions are executed one by one.4 Commitment-ordered 1.

The order of each pair of conflicting actions in S1 and S2 are the same. In schedule E. The actions access the same object (read or write). [edit] Conflict-serializable A schedule is said to be conflict-serializable when the schedule is conflict-equivalent to one or more serial schedules. T2:W(Y). . but in the end.A schedule that is equivalent to a serial schedule has the serializability property. The following set of actions is conflicting: • T1:R(X). (informally speaking. T3:R(X) [edit] Conflict equivalence The schedules S1 and S2 are said to be conflict-equivalent if the following conditions are satisfied: 1. T2:W(X). [edit] Conflicting actions Two or more actions are said to be in conflict if: 1. T2:R(X). The actions belong to different transactions. both schedules are containing and working on the same thing) 2. Both schedules S1 and S2 involve the same set of actions in the same set of transactions. 2. the order in which the actions of the transactions are executed is not the same as in D. 3. T3:W(X) While the following sets of actions are not: • • T1:R(X). E gives the same result as D. At least one of the actions is a write operation. T3:R(X) T1:R(X).

or commitment-order-serializable. does the transaction Ti in S2.Another definition for conflict-serializability is that a schedule is conflict-serializable if and only if there exists an acyclic precedence graph/serializability graph for the schedule. [edit] View-serializable A schedule is said to be view-serializable if it is view-equivalent to some serial schedule. as induced by their schedule's acyclic precedence graph/serializability graph. so does the transaction Ti in S2.T2> [edit] Commitment-ordered A schedule is said to be commitment-ordered. if it obeys the Commitment ordering (commit-order-serializability) schedule property. 2. If the transaction Ti in S1 reads an initial value for object X. so is the transaction Ti in S2. and the precedence order of transactions' commitment events is identical to the precedence (partial) order of the respective transactions. [edit] View equivalence Two schedules S1 and S2 are said to be view-equivalent when the following conditions are satisfied: 1. This means that it is conflict-serializable. If the transaction Ti in S1 is the final transaction to write the value for an object X. so 3. Note that by definition. . Which is conflict-equivalent to the serial schedule <T1. If the transaction Ti in S1 reads the value written by transaction Tj in S1 for object X. all conflict-serializable schedules are view-serializable.

The above example is not conflict-serializable. [edit] Recoverable Transactions commit only after all transactions whose changes they read commit. There are however view-serializable schedules that are not conflict-serializable: those schedules with a transaction performing a blind write. T3>. but it is view-serializable since it has a viewequivalent serial schedule <T1.Notice that the above example (which is the same as the example in the discussion of conflictserializable) is both view-serializable and conflict-serializable at the same time. . T2. Since determining whether a schedule is view-serializable is NP-complete. view-serializability has little practical interest.

These schedules are recoverable. F is recoverable because T1 commits before T2, that makes the value read by T2 correct. Then T2 can commit itself. In F2, if T1 aborted, T2 has to abort because the value of A it read is incorrect. In both cases, the database is left in a consistent state. [edit] Unrecoverable If a transaction T1 aborts, and a transaction T2 commits, but T2 relied on T1, we have an unrecoverable schedule.

In this example, G is unrecoverable, because T2 read the value of A written by T1, and committed. T1 later aborted, therefore the value read by T2 is wrong, but since T2 committed, this schedule is unrecoverable. [edit] Avoids cascading aborts (rollbacks) Also named cascadeless. A single transaction abort leads to a series of transaction rollback. Strategy to prevent cascading aborts is to disallow a transaction from reading uncommitted changes from another transaction in the same schedule. The following examples are the same as the one from the discussion on recoverable:

In this example, although F2 is recoverable, it does not avoid cascading aborts. It can be seen that if T1 aborts, T2 will have to be aborted too in order to maintain the correctness of the schedule as T2 has already read the uncommitted value written by T1. The following is a recoverable schedule which avoids cascading abort. Note, however, that the update of A by T1 is always lost.

Cascading aborts avoidance is sufficient but not necessary for a schedule to be recoverable. [edit] Strict A schedule is strict if for any two transactions T1, T2, if a write operation of T1 precedes a conflicting operation of T2 (either read or write), then the commit event of T1 also precedes that conflicting operation of T2. Any strict schedule is cascadeless, but not the converse. [edit] Hierarchical relationship between serializability classes The following subclass clauses illustrate the hierarachical relationships between serializability classes: • • Serial ⊂ commitment-ordered ⊂ conflict-serializable ⊂ view-serializable ⊂ all schedules Serial ⊂ strict ⊂ avoids cascading aborts ⊂ recoverable ⊂ all schedules

The Venn diagram illustrates the above clauses graphically.

Venn diagram for serializability classes Practical implementations In practice, most businesses aim for conflict-serializable and recoverable (primarily strict) schedules.

Serializability
In databases and transaction processing, serializability is the property of a schedule (history) being serializable. It means equivalence (in its outcome, the resulting database state, the values of the database's data) to a serial schedule (serial schedule: No overlap in two transactions' execution time intervals; consecutive transaction execution). It relates to the isolation property of a transaction, and plays an essential role in concurrency control. Transactions are usually being executed concurrently since serial executions are typically extremely inefficient and thus impractical. Contents    1 Correctness - Serializability 2 Correctness - Recoverability 3 Relaxing serializability

serializability by itself is not sufficient for correctness. Correctness . in most cases it does not matter much if a product. whose data was updated a short time ago. The rationale behind it is the following: If each transaction is correct by itself. then the total sum of money may not be preserved. Schedules also need to possess the Recoverability property. Commercial databases provide concurrency control with a whole range of (controlled) serializability violations (see isolation levels) in order to achieve higher performance. and "stepping on" and erasing what has been written by another transaction before it has become permanent in the database.Serializability Serializability is the major criterion for the correctness of concurrent transactions' executions (i. compromising recoverability always violates the database's integrity. Relaxing serializability In many applications. This is caused by one transaction writing. Schedules that are not serializable are likely to generate erroneous outcomes.e. and a major goal for concurrency control. Well known examples are with transactions that debit and credit accounts with money. when the application can tolerate such violations. As such it is supported in all general purpose database systems. when retrieving a list of products according to specification. is correct. Higher performance means better transaction execution rate and shorter transaction response time (transaction duration). As a result.    4 View serializability and Conflict serializability 5 Testing conflict serializability 6 Common mechanism . and possibly access same shared resources).Recoverability In systems where transactions can abort (virtually all real systems). It will typically appear in such a list when tried again a short time later. If the related schedules are not serializable. View serializability and Conflict serializability . While serializability can be compromised in many applications. even if it meets the specification. does not appear in the list. This does not happen if serializability is maintained. then any serial execution of these transactions is correct.. absolute correctness is not needed. For example. Money could disappear. unlike with finances. transactions that have overlapping execution time intervals. any execution that is equivalent (in its outcome) to a serial execution. Recoverability means that committed transactions have not read data written by aborted transactions (whose effects do not exist in the resulting database states).Commitment ordering Correctness .(Strong) Strict Two Phase Locking 7 Global serializability . or be generated from nowhere.

needs to be acyclic. View serializability of a schedule is defined by equivalence to a serial schedule with the same transactions. depending on lock type and the other . Access by another transaction may be blocked. which may consist each of several "simple" read/write operations) requires that they are noncommutative (changing their order also changes their combined result).(Strong) Strict Two Phase Locking (Strong) Strict Two Phase Locking (SS2PL) is a common mechanism (and schedule property) utilized to enforce in database systems both conflict serializability and Strictness. since correctness is involved. at least one is aborted) on each such cycle. but nevertheless. the operations increment and decrement of a counter are both write operations. a special case of recoverability. such a situation is carefully handled. Two operations (read or write) are conflicting if they are of different transactions. but rather prevent or break cycles implicitly (e. as reflected by precedence of conflicting operations in the transactions (transactions are nodes. conflict serializability is easier to achieve. The related schedule property is also referred to as Rigorousness. The probability of cycle generation is typically low. nor aborted) transaction (one is sufficient.. Many mechanisms do not maintain a conflict graph as a data structure. such that both schedules have the same sets of respective ordered (by time) pairs of conflicting operations (same precedence relations of respective conflicting operations). A more general definition of conflicting operations (also for complex operations. see SS2PL below). depending on operation (and the specific implementation. Any schedule with the latter property also has the first property. Transactions aborted due to serializability violation prevention are executed again. and at least one of them is write.g. For example. and Conflict serializability. Each such operation needs to be atomic by itself (by proper system support) in order to be commutative (nonconflicting) with the other. upon the same data item. Testing conflict serializability Schedule compliance with Conflict serializability can be tested as follows: The Conflict graph (Serializability graph) of the schedule for committed transactions. Common mechanism . various models with different lock types exist). typically upon conflict.Two major types of serializability exist: View serializability. In this mechanism each data item is locked by a transaction before accessing it (any read or modify operation): The item is marked by a lock of a certain type. However. the directed graph representing precedence of transactions in the schedule. Conflict serializability is defined by equivalence to a serial schedule with the same transactions. This means that when a cycle of committed transactions is generated. such that respective transactions in the two schedules read and write the same data values ("view" the same data values). serializability is violated. but do not need to be considered conflicting since they are commutative. when generated. Thus conflict serializability mechanisms prevent cycles of committed transactions by aborting an undecided (neither committed. in order to break it. precedence relations are directed edges). and is widely utilized.

fully distributed solution (no central processing component or central data structure are needed) for guaranteeing global serializability in heterogeneous environments with different database system types and other multiple transactional objects (objects with states accessed and modified only by transactions) that may employ different serializability mechanisms.. Mutual blocking of two transactions or more results in a deadlock. The commitment event of a distributed transaction is always generated by some atomic commitment protocol (utilized to reach consensus among the transaction's components on whether to commit or abort it. Global serializability . since even if each database enforces serializability. and the needed communication between databases to reach conflict serializability using conflict information is excessive and unfeasible. With the Commitment ordering property the precedence (partial) order of transactions' commitment events is identical to the precedence (partial) order of the respective transactions as determined by their schedule's (acyclic) conflict graph. the two phase commit protocol). CO by itself is not sufficient as a concurrency control mechanism. and any SS2PL compliant database can participate in multidatabase systems that utilize the CO solution without any modification or addition of a CO algorithm component. An effective way to enforce conflict serializability globally in such a system is to enforce the Commitment ordering (CO. The atomic commitment protocol plays a central role in the distributed CO algorithm . where execution of these transactions is stalled. A deadlock is a reflection of a potential cycle in the conflict graph. this procedure is always carried out for distributed transactions.g. independently of concurrency control and CO). and no completion can be reached. where transactions span multiple databases (two or more). Deadlocks are resolved by aborting a transaction involved with such potential cycle (aborting one transaction per cycle is sufficient). All locked data on behalf of a transaction are released only after the transaction has ended (either committed or aborted). An effective local (to any single database) CO algorithm can run beside any local concurrency control mechanism (serializability enforcing mechanism) without interfering with its resource access scheduling strategy. CO is a broad special case of conflict serializability. SS2PL implies Commitment ordering. the global schedule of all the databases is not necessarily serializable. is problematic. and if enforced locally in each database.transaction's access operation type. since it lacks the recoverability property. without aborting any transaction in the schedule. As such CO provides a general. Any conflict serializable schedule can be made a CO compliant one. that would occur without the blocking. by delaying commitment events to comply with the needed partial order. which should be supported as well.Commitment ordering Enforcing global serializability in a multidatabase system (typically distributed). already needed by each distributed transaction to reach atomicity. Transactions aborted due to deadlock resolution are executed again. also the global schedule possesses this property (CO). The only needed communication between the databases for this purpose is the (unmodified) messages of an atomic commitment protocol (e. or Commit-order-serializability) property in each database.

In the computing world deadlock refers to a specific condition when two or more processes are each waiting for another to release a resource. Computers intended for the time-sharing and/or real-time markets are often equipped with a hardware lock (or hard lock) which guarantees exclusive access to processes. a deadlock occurs when the person with the pencil needs the ruler and the person with the ruler needs the pencil. Deadlock It has been suggested that Circular wait be merged into this article or section. This situation may be likened to two people who are drawing diagrams. It is often seen in a paradox like 'the chicken or the egg'. Both requests can't be satisfied. Deadlocks are particularly troubling because there is no general solution to avoid (soft) deadlocks. and thus neither ever does. G. For other uses of the word "deadlock". If one person takes the pencil and the other takes the ruler. lock. or soft. (Discuss) This article is about deadlock in computing. see Deadlock (disambiguation). so a deadlock occurs. Coffman. which implies a global cycle (a cycle that spans two or more database) in the global conflict graph. In case of incompatible local commitment orders in two or more databases. A deadlock is a situation wherein two or more competing actions are waiting for the other to finish. known as the Coffman conditions from their first description in a 1971 article by E. the atomic commitment protocol breaks that cycle by aborting a transaction on the cycle. Deadlock is a common problem in multiprocessing where many processes share a specific type of mutually exclusive resource known as a software.which enforces CO globally. Contents          1 Necessary conditions 2 Examples of deadlock conditions 3 Deadlock avoidance 4 Deadlock prevention 5 Deadlock detection 6 Distributed deadlocks 7 Livelock 8 See also 9 External links Necessary conditions There are four necessary conditions for a deadlock to occur. before he can give up the ruler. or more than two processes are waiting for resources in a circular chain (see Necessary conditions). forcing serialization. . with only one pencil and one ruler between them.

the system sees if granting the request will mean that the system will enter an unsafe state. Mutual exclusion condition: a resource is either assigned to one process or it is available 2. A text editor program is written that sends the formatter with some text and then waits for the results.g.1. e. The system then only grants request that will lead to safe states. which requires resource usage limit . but does so only after receiving "enough" text to work on (e. available. which will never arrive since the text editor has sent it all of the text it has. it must know in advance at any time the number and type of all resources in existence. 1KB). this may lead to deadlock if the second application then attempts to obtain the lock that is held by the first application. and not wait for additional text.g. EOF) with its last (partial) block of text. Nevertheless. Circular wait condition: two or more processes form a circular chain where each process waits for a resource that the next process in the chain holds Deadlock only occurs in systems where all 4 conditions happen. in the meantime. Client applications using the database may require exclusive access to a table. it will usually be terminated after a time (and. Hold and wait condition: processes already holding resources may request new resources 3. No preemption condition: only a process holding a resource may release it 4. In this case a deadlock may occur on the last block of text. This type of deadlock is sometimes referred to as a deadly embrace (properly used only when only two applications are involved) or starvation. (But this particular type of deadlock is easily prevented. Since the formatter may not have sufficient text for processing. Deadlock avoidance Deadlock can be avoided if certain information about processes is available in advance of resource allocation. which message will force the formatter to return the last (partial) block after formatting. and requested. too. rolled back to a state prior to being obtained by the application). and in order to gain exclusive access they ask for a lock. If one client application holds a lock on a table and attempts to obtain the lock on a second table that is already held by a second client application. the text editor is itself suspended waiting for the last output from the formatter. But general algorithms can be implemented within the operating system so that if one or more applications becomes blocked. since there is no general solution for deadlock prevention. However.) Another example might be a text formatting program that accepts text sent to it to be processed and then returns the results. Examples of deadlock conditions An example of a deadlock which may occur in database products is the following. For every resource request. each type of deadlock must be anticipated and specially prevented. this situation. One known algorithm that is used for deadlock avoidance is the Banker's algorithm. by using an all-or-none resource allocation algorithm.. Meanwhile. In order for the system to be able to figure out whether the next state will be safe or unsafe. is easily prevented by having the text editor send a forcing message (eg. is allowed no other resources and may need to surrender those it already has. it will suspend itself while waiting for the additional text. meaning a state that could result in deadlock.

such as serializing tokens. even the memory address of resources has been used to determine ordering) and Dijkstra's solution. The "hold and wait" conditions may be removed by requiring processes to request all the resources they will need before starting up (or before embarking upon a particular set of operations). (Such algorithms. Smaller time stamps are older processes. because the halting problem can be rephrased as a deadlock scenario. Algorithms that avoid mutual exclusion are called non-blocking synchronization algorithms. . Y dies Y waits    Deadlock detection Often neither deadlock avoidance nor deadlock prevention may be used.) A "no preemption" (lockout) condition may also be difficult or impossible to avoid as a process has to be able to have a resource for a certain amount of time. The circular wait condition: Algorithms that avoid circular waits include "disable interrupts during critical sections" .to be known in advance. This means that deadlock avoidance is often impossible. and rolls back and restarts one or more of the processes in order to remove the deadlock. However. is an inefficient use of resources. This proves impossible for resources that cannot be spooled. are known as the all-or-none algorithms. However. Instead deadlock detection and process restart are used by employing an algorithm that tracks resource allocation and process states. Detecting a deadlock that has already occurred is easily possible since the resources that each process has locked and/or currently requested are known to the resource scheduler or OS. while larger timestamps represent younger processes. In both these algorithms there exists an older process (O) and a younger process (Y). since it is very costly in overhead.) Algorithms that allow preemption include lock-free and wait-free algorithms and optimistic concurrency control. Another way is to require processes to release all their resources before requesting all the resources they will need. in any case. inability to enforce preemption may interfere with a priority algorithm. Wait/Die Wound/Wait O is waiting for a resource that is being held by Y O waits Y is waiting for a resource that is being held by O Y dies Deadlock prevention Deadlocks can be prevented by ensuring that at least one of the following four conditions occur:  Removing the mutual exclusion condition means that no process may have exclusive access to a resource. this advance knowledge is frequently difficult to satisfy and. generally undecidable. This too is often impractical. and even with spooled resources deadlock could still occur. in fact. or the processing outcome may be inconsistent or thrashing may occur. and is to be avoided. for many systems it is impossible to know in advance what every process will request. (Note: Preemption of a "locked out" resource generally implies a rollback. and "use a hierarchy to determine a partial ordering of resources" (where no obvious hierarchy exists. Two other algorithms are Wait/Die and Wound/Wait. Process age can be determined by a time stamp at process creation time. Detecting the possibility of a deadlock before it occurs is much more difficult and is.

However. If more than one process takes action. [3] See also              Banker's algorithm Computer bought the farm Deadlock provision Dining philosophers problem Gridlock (in vehicular traffic) Hang Infinite loop Mamihlapinatapai Race condition Sleeping barber problem Stalemate Synchronization the SPIN model checker can be used to formally verify that a system will never enter a deadlock. This can be avoided by ensuring that only one process (chosen randomly or by priority) takes action. and each tries to be polite by moving aside to let the other pass. the deadlock detection algorithm can repeatedly trigger. Livelock is a risk with some algorithms that detect and recover from deadlock. Distributed deadlocks can be detected either by constructing a global wait-for graph from local wait-for graphs at a deadlock detector or by a distributed algorithm like edge chasing. Phantom deadlocks are deadlocks that are detected in a distributed system but don't actually exist . Distributed deadlocks Distributed deadlocks can occur in distributed systems when distributed transactions or concurrency control is being used.they have either been already resolved or no longer exist due to transactions aborting. [1] Livelock is a special case of resource starvation. . except that the state of the processes involved in the livelock constantly changes with regards to each other. In the general case. deadlock detection may be decidable. livelock occurs when two people meet in a narrow corridor. but they end up swaying from side to side without making any progress because they always both move the same way at the same time. it is not possible to distinguish between algorithms that are merely waiting for a very unlikely set of circumstances to occur and algorithms that will never finish because of deadlock. none progressing. the general definition only states that a specific process is not progressing. using specific means of locking resources. in specific environments. [2] As a real-world example. Livelock A livelock is similar to a deadlock.

atomicity is one of the ACID transaction properties. the principle of atomicity . but of hotels and transport. complete success or complete failure . which can cause greater problems than rejecting the whole series outright. A guarantee of atomicity prevents updates to the database occurring only partially. consistency also relies on rollback in the event of a consistency violation by an illegal transaction. or all do not occur ("fail". These logs (often the metadata) are synchronized as necessary once the actual changes were successfully made. atomicity itself relies on durability to ensure transactions are atomic even in the face of external failures. Many databases also support a commitrollback mechanism aiding in the implementation of atomic transactions.g. power outages). It is not acceptable for customers to pay for tickets without securing their requested flight or to reserve tickets without payment succeeding. Tickets must either be paid for and reserved on a flight. POSIX-compliant systems provide the . In database systems. For example. One example of atomicity is in ordering airline tickets. any application-level implementation relies on operating system functionality which in turn makes use of specialized hardware to guarantee that an operation is non-interruptible by either software attempting to re-divert system resources (see pre-emptive multitasking) or resource unavailability (e. using journaling (see journaling file system). Although implementations vary depending on factors such as concurrency issues. although failure is not considered catastrophic). isolation relies on atomicity to roll back changes in the event of isolation failures such as deadlock.remain. Ultimately. or neither paid for nor reserved. atomicity is implemented by providing some mechanism to indicate which transactions have been started and finished. These are usually also implemented using some form of logging/journaling to be able to track changes. [edit] Implementation Typically. failure to detect errors and manually roll back the enclosing transaction may cause isolation and consistency failures. [edit] Orthogonality Atomicity is not completely orthogonal to the other ACID properties of transactions. or by keeping a copy of the data before any changes were made. An atomic transaction is series of database operations which either all occur. Several filesystems have developed methods for avoiding the need to keep multiple copies of data. Unrecorded entries are simply ignored afterwards on crash recovery. For example. As a result of this.Atomicity (Redirected from Atomic transaction) See also Atomicity (disambiguation).ie. One atomic transaction might include the booking not only of flights. Finally. in exchange for the right money at the exact current exchange rate.

raising the interrupt level to disable all possible interrupts (of hardware and software origin) may be used to implement the atomic synchronization function primitives.2 Two processes 3 Locking 4 See also Conditions To accomplish this. When these are lacking. An atomic operation in computer science refers to a set of operations that can be combined so that they appear to the rest of the system to be a single operation with only two possible outcomes: success or failure. fcntl(2). To the rest of the system. No in-between state is accessible. see atom. atomic operations such as test-and-set (TAS). (Discuss) See also Atomicity (disambiguation). and 2. sem_post(2). semop(2). Contents • • • • 1 Conditions 2 Example o 2. no other process can know about the changes being made. . flock(2). This is an atomic operation. fsync(2) and rename(2). it appears that the set of operations either succeeds or fails all at once. sem_wait(2). two conditions must be met: 1.open(2) system call which allows applications to atomically open a file. and/or atomic increment/decrement operations are needed. Until the entire set of operations completes. rasctl(2) (NetBSD restartable sequences). The etymology of the phrase originates in the Classical Greek concept of a fundamental and indivisible component. Atomic operation It has been suggested that this article or section be merged into Linearizability. These low-level operations are often implemented in machine language or assembly language. fdatasync(2). or when necessary.1 One process o 2. If any of the operations fail then the entire set of operations fails. Other popular system calls that may assist to achieve atomic operations from userspace consist of mkdir(2). At the hardware level. and the state of the system is restored to the state it was in before any of the operations began.

the process writes the new value back into the memory location. making such an error difficult to detect and debug. unaware that the other process has already updated the value in the memory location. and before it reads the second 32-bits the value in memory gets changed. this can be non-trivial to implement. shared memory location: 1. 2. the first process writes a now-wrong value into the memory location. the same value that the first process read. imagine two processes are running incrementing a single. the specific order in which the processes run can change the results. Locking . In a real system. For example. 2. a socalled invariant). Furthermore. but before it can write the new value back to the memory location it is suspended. 3. If a process has only read the first 32-bits. 1. the operations can be more complex and the errors introduced extremely subtle. it will have neither the original value nor the new value but a mixed-up garbage value. and the second process is allowed to run: the process reads the value in the memory location. This is a trivial example. the second process writes the new value into the memory location. imagine a single process is running on a computer incrementing a memory location. the second process adds one to the value. 3. reading a 64-bit value from memory may actually be implemented as two sequential reads of two 32-bit memory locations. Two processes Now.Even without the complications of multiple processing units. As long as there is the possibility of a change in the flow of control. the process adds one to the value. Example One process For example. the first process reads the value in memory location. 2. To increment that memory location: 1. the second process reads the value in memory location. The second process is suspended and the first process allowed to run again: 1. without atomicity there is the possibility that the system can enter an invalid state (invalid as defined by the program. the first process adds one to the value.

However.2 Computer security  2. or the ability to temporarily turn off interrupts ensuring that the currently running process cannot be suspended. See also Race condition This article may require cleanup to meet Wikipedia's quality standards.1 File systems  2. but these can be inefficient. Algorithms. such as an atomic test-and-set or compare-and-swap operation. without hardware support in the processor. where inputs vary. especially logic circuits. Please discuss this issue on the talk page or replace this tag with a more specific message.A clever programmer might suggest that a lock should be placed around this "critical section". it may only be defined for steady-state signals. Contents   1 Electronics  1.1.1. a lock is nothing more than a memory location which must be read. a finite delay will occur before the output changes. A race condition or race hazard is a flaw in a system or process whereby the output of the process is unexpectedly and critically dependent on the sequence or timing of other events.3 Asynchronous finite state machines 3 See also 4 External links   Electronics A typical example of a race condition may occur in a system of logic gates.1 Real life examples  2. As the inputs change state. Race conditions can occur in poorly-designed electronics systems. The term originates with the idea of two signals racing each other to influence the output first. due to the .2 Networking  2.3 Life-critical systems  2. inspected. such as spin locking.1. If a particular output depends on the state of the inputs. but they can and often do also arise in computer software.1 Types 2 Computing  2. Most modern processors have some facility which can be used to implement locking. This article has been tagged since April 2006. and written. have been devised that implement software-only locking.

3. and so the gate's output will also be true. 4. 5. 2. Integer i = 0. T1 reads the value of i from memory into a register : 0 T1 increments the value of i in the register: (register contents) + 1 = 1 T1 stores the value of the register in memory : 1 T2 reads the value of i from memory into a register : 1 . on input B. if changes in the value of X take longer to propagate to input B than to input A then when X changes from false to true.g. logic gates can enter metastable states. For example. Sometimes they are cured using inductive delay-line elements to effectively increase the time duration of an input signal. Dynamic race conditions These result in multiple transitions when only one is intended. Computing Race conditions may arise in software. They are due to interaction between gates (Dynamic race conditions can be eliminated by using not more than two levels of gating). the output (X AND NOT X) should never be high. the following sequence of operations would take place: 1. the Karnaugh map article includes a concrete example of a race condition and how to eliminate it) encourage designers to recognise and eliminate race conditions before they cause problems.physical nature of the electronic system. Certain systems can tolerate such glitches. Ideally. the system can rapidly depart from its designed behaviour (in effect. especially when communicating between separate processes or threads of execution. a brief period will ensue during which both inputs are true. Types Static race conditions These are caused when a signal and its complement are combined together. consider a two input AND gate fed with a logic signal X on input A and its negation. the output may change to an unwanted state before settling back to the designed state. NOT X. the temporary glitch becomes permanent). Proper design techniques (e. but if for example this output signal functions as a clock for further systems that contain memory. Essential race conditions These are caused when an input has two transitions in less than the total feedback propagation time. Karnaugh maps—note. which create further problems for circuit designers. For a brief period. Here is a simple example: Let us assume that two threads T1 and T2 each want to increment the value of a global integer by one. See critical race and non-critical race for more information on specific types of race conditions. However. As well as these problems. In theory.

so its operation was atomic. in pseudocode: global integer A = 0. such as a memory location. The alternative sequence of operations below demonstrates this scenario: 1. if the two threads run simultaneously without locking or synchronization. 4. // increments the value of A and print "RX" // activated whenever an interrupt is received from the serial controller task Received() { A = A + 1. Integer i = 2 In the case shown above. T1 reads the value of i from memory into a register : 0 T2 reads the value of i from memory into a register : 0 T1 increments the value of i in the register: (register contents) + 1 = 1 T2 increments the value of i in the register: (register contents) + 1 = 1 T1 stores the value of the register in memory : 1 T2 stores the value of the register in memory : 1 Integer i = 1 The final value of i is 1 instead of the expected result of 2. 7. T2 increments the value of i in the register: (register contents) + 1 = 2 7. 2. consider the following two tasks.6. as expected. This occurs because the increment operations of the second case are non-atomic. However. 6. the final value of i is 2. In the first case. T2 stores the value of the register in memory : 2 8. T1 was not interrupted while accessing the variable i. } } Output would look something like: 0 0 0 RX RX 2 RX RX . 5. } // prints out only the even numbers // is activated every second task Timeout() { if (A is divisible by 2) { print A. the outcome of the operation could be wrong. Atomic operations are those that cannot be interrupted while accessing some resource. 3. 8. For another example. Integer i = 0. print "RX".

so elects to execute the "print A" next. Mutexes are used to address this problem in concurrent programming. try to start the same-named channel at the same time. two or more programs may "collide" in their attempts to modify or access a file. data is received on the serial port. Such a risk may be overlooked for a long time in a system that seems very reliable. Networking In networking. incrementing A and printing "RX" 5. this is not adequate because in complex systems the actions of other running programs can be unpredictable. Real life examples File systems In file systems. (Alternately. File locking provides a commonly-used solution.4 4 Now consider this chain of events. which might occur next: 1. 3. Software not carefully designed to anticipate and handle this rare situation may then become quite fragile and unpredictable. using the current value of A. and all other processes that need to access the data in that file do so only via interprocess communication with that one process (which of course requires synchronization at the process level). A more cumbersome remedy involves reorganizing the system in such a way that one unique process (running a daemon or the like) has exclusive access to the file. task Received runs to completion. which could result in data corruption. on different ends of the same network. since neither server will yet have received the other server's signal that it has allocated that channel. causing an interrupt and a switch to task Received 4. task timeout executes print A. control returns to task Timeout 6. if this request fails then the task is postponed. activating task Timeout 2.) A more common but incorrect approach is to simply verify that enough disk space (for example) is available before starting a task. (Note that this problem has been largely solved by various IRC server implementations. timeout occurs. but this is a commonly overlooked hazard in many computer systems. each of those points can be equipped with error handling. consider a distributed chat network like IRC. which is 5. Probably the best known example of this occurred with the near-loss of the Mars Rover "Spirit" not long after landing. A different form of race hazard exists in file systems where unrelated programs may affect each other by suddenly using up available resources such as disk space (or memory.) . avoiding the many points where failure could have occurred. where a user acquires channeloperator privileges in any channel he starts. or processor cycles). But eventually enough data may accumulate or enough other software may be added to critically destabilize many parts of a system. A solution is for software to request and reserve all the resources it will need before beginning a task. If two users on different servers. or the success of the entire task can be verified before proceeding afterwards. each user's respective server will grant channel-operator privileges to each user. task Timeout evaluates A and finds it is divisible by 2.

as well as what users started them and therefore have what privileges). a security vulnerability called a time-of-check-to-time-of-use (TOCTTOU) bug is created. Life-critical systems Software flaws in Life-critical systems can be disastrous. the concept of the "shared resource" covers the state of the network (what channels exist. In this case. Where users find such a solution unacceptable. This software flaw eventually led to the North American Blackout of 2003. delaying their awareness of the problem.) Computer security A specific kind of race condition involves checking for a predicate (e. which each server can freely change as long as it signals the other servers on the network about the changes so that they can update their conception of the state of the network. the latency across the network makes possible the kind of race condition described. . while the state can change between the time of check and the time of use. Race conditions were among the flaws in the Therac-25 radiation therapy machine. Another example is the Energy Management System provided by GE Energy and used by Ohio-based FirstEnergy Corp. the asynchronous machine will fail if multiple inputs change at the same time. appointing one server to control who holds what privileges—would mean turning the distributed network into a centralized one (at least for that one part of the network operation). when three sagging power lines were tripped simultaneously. heading off race conditions by imposing a form of control over access to the shared resource—say. for authentication). a pragmatic solution can have the system 1) recognize when a race condition has occurred.g. then acting on the predicate.In this case of a race condition. However. the condition prevented alerts from being raised to the monitoring technicians. (GE Energy later developed a software patch to correct the previously undiscovered error. A race condition existed in the alarm subsystem. When this kind of bug exists in security-conscious code. (and by many other power facilities as well). Asynchronous finite state machines Even after ensuring that single bit transitions occur between states. The solution to this is to design a machine so that each state is sensitive to only one input change. and 2) repair the ill effects. which led to the death of five patients and injuries to several more.

Sign up to vote on this title
UsefulNot useful