You are on page 1of 7

On-Line Analytical Processing (OLAP) is an advanced data analysis environment that supports

decision making, business modeling, and operations research activities. OLAP systems are
designed to use both operational and Data Warehouse data.
Geographic information systems (GISs) is a system that creates, manages, analyzes, and maps
all types of data. GIS connects data to a map, integrating location data (where things are) with
all types of descriptive information (what things are like there). This provides a foundation for
mapping and analysis that is used in science and almost every industry. GIS helps users
understand patterns, relationships, and geographic context. The benefits include improved
communication and efficiency as well as better management and decision making.
Spatial OLAP refers to such OLAP operations in a spatial data warehouse, such as drilling into
the detailed spatial locations (e.g., finding detailed fact summaries in a city district), or slicing
of a data cube with a set of constraints (e.g., finding average midnight temperature
distribution in May in the city of Chicago). 
World Wide Web Consortium (W3C) is an international organization committed to improving
the web. It is made up of several hundred member organizations from a variety of related IT
industries. W3C sets standards for the World Wide Web (WWW) to facilitate interoperability
and cooperation among all web stakeholders. It was established in 1994 by the creator of the
WWW, Tim Berners-Lee.
Resource Description Framework (RDF) is a standard model for data interchange on the Web.
RDF has features that facilitate data merging even if the underlying schemas differ, and it
specifically supports the evolution of schemas over time without requiring all the data
consumers to be changed.
Web Ontology Language (OWL) is a Semantic Web language designed to represent rich and
complex knowledge about things, groups of things, and relations between things. OWL is a
computational logic-based language such that knowledge expressed in OWL can be exploited by
computer programs, e.g., to verify the consistency of that knowledge or to make implicit
knowledge explicit. OWL documents, known as ontologies, can be published in the World Wide
Web and may refer to or be referred from other OWL ontologies. OWL is part of the W3C’s
Semantic Web technology stack, which includes
ETL (or Extract, Transform, Load) is a process of data integration that encompasses three steps
— extraction, transformation, and loading. In a nutshell, ETL systems take large volumes of raw
data from multiple sources, converts it for analysis, and loads that data into your warehouse
Database Management System (DBMS) is a software package designed to define, manipulate,
retrieve and manage data in a database. A DBMS generally manipulates the data itself, the data
format, field names, record structure and file structure. It also defines rules to validate and
manipulate this data. Database management systems are set up on specific data handling
concepts, as the practice of administrating a database evolves. The earliest databases only
handled individual single pieces of specially formatted data. Today’s more evolved systems can
handle different kinds of less formatted data and tie them together in more elaborate ways.

Microsoft SQL Server is a relational database management system (RDBMS) that supports a
wide variety of transaction processing, business intelligence and analytics applications in
corporate IT environments. Microsoft SQL Server is one of the three market-leading database
technologies, along with Oracle Database and IBM's DB2.
Rectangle: Represents Entity sets.
Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set
Entity–Relationship Model (Er Model) describes the structure of a database with the help of a
diagram, which is known as Entity Relationship Diagram (ER Diagram).
An ER model is a design or blueprint of a database that can later be implemented as a database.
The main components of E-R model are: entity set and relationship set.
ER diagram shows the relationship among entity sets. An entity set is a group of similar entities
and these entities can have attributes. In terms of DBMS, an entity is a table or attribute of a
table in database, so by showing relationship among tables and their attributes, ER diagram
shows the complete logical structure of a database.
ER diagram has three main components:
1. Entity
2. Attribute
3. Relationship
ENTITY
An entity is an object or component of data. An entity is represented as rectangle in an
ER diagram.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship
with other entity is called weak entity. The weak entity is represented by a double rectangle.
For example – a bank account cannot be uniquely identified without knowing the bank to which
the account belongs, so bank account is a weak entity.
ATTRIBUTE
An attribute describes the property of an entity. An attribute is represented as Oval in
an ER diagram. There are four types of attributes:

1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute

KEY ATTRIBUTE A key attribute can uniquely identify an entity from an entity set. For example,
student roll number can uniquely identify a student from a set of students. Key attribute is
represented by oval same as other attributes however the text of key attribute is underlined.
COMPOSITE ATTRIBUTE An attribute that is a combination of other attributes is known as
composite attribute. For example, In student entity, the student address is a composite
attribute as an address is composed of other attributes such as pin code, state, country.
MULTIVALUED ATTRIBUTE An attribute that can hold multiple values is known as multivalued
attribute. It is represented with double ovals in an ER Diagram. For example – A person can
have more than one phone numbers so the phone number attribute is multivalued.

DERIVED ATTRIBUTE A derived attribute is one whose value is dynamic and derived from
another attribute. It is represented by dashed oval in an ER Diagram. For example – Person age
is a derived attribute as it changes over time and can be derived from another attribute (Date of
birth).

RELATIONSHIP
A relationship is represented by diamond shape in ER diagram, it shows the relationship
among entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

ONE TO ONE RELATIONSHIP When


a single instance of an entity is associated with a single instance of another entity then it is
called one to one relationship. For example, a person has only one passport and a passport is
given to one person.

ONE TO MANY RELATIONSHIP


When a single instance of an entity is associated with more than one instances of
another entity then it is called one to many relationship. For example – a customer can place
many orders but a order cannot be placed by many customers.

MANY TO ONE RELATIONSHIP


When more than one instances of an entity is associated with a single instance of
another entity then it is called many to one relationship. For example – many students can
study in a single college but a student cannot study in many colleges at the same time.

MANY TO MANY RELATIONSHIP


When more than one instances of an entity is associated with more than one
instances of another entity then it is called many to many relationship. For example, a can be
assigned to many projects and a project can be assigned to many students.

GENERALIZATION has three essential characteristics

1. POPULATION INCLUSION meaning that every instance of the subtype is also an instance
of the supertype. In our example, this means that every temporary employee is also an
employee of the Northwind company.

2. INHERITANCE meaning that all characteristics of the supertype (e.g., attriibutes and
roles) are inherited by the subtype. Thus, in our example, temporary employees also
have, for instance, a name and a title.

3. SUBSTITUTABILITY meaning that each time an instance of the supertype is required


(e.g., in an operation or in a query), an instance of the subtype can used instead.
RELATIONAL MODEL, the data and relationships are represented by collection of inter-
related tables. Each table is a group of column and rows, where column
represents attribute of an entity and rows represents records.

A database trigger is procedural code that is automatically executed in response to certain


events on a particular table or view in a database. The trigger is mostly used for maintaining the
integrity of the information on the database.

A functional dependency is a constraint between two sets of attributes in a relation.


A normal form is an integrity constraint aimed at guaranteeing that a relational schema
satisfies particular properties.

Relational Query Languages


Data stored in a relational database can be queried using different formalisms. Two
kinds of query languages are typically defined.
In a procedural language, a query is specified indicating the operations needed to retrieve the
desired result.
In a declarative language, the user only indicates what she wants to retrieve, leaving to the
DBMS the task of determining the equivalent procedural query that is to be executed.
Relational Algebra
The relational algebra is a collection of operations for manipulating relations. These
operations can be of two kinds: unary, which receive as argument a relation and return another
relation, or binary, which receive as argument two relations and return a relation.
SQL (structured query language) is the most common language for creating, manipulating, and
retrieving data from relational DBMSs. SQL is composed of several sublanguages.
The data definition language (DDL) is used to define the schema of a database.
The data manipulation language (DML) is used to query a database and to modify its content
(i.e., to add, update, and delete data in a database).
Subqueries
A subquery (or a nested query) is an SQL query used within a SELECT, FROM, or WHERE clause.
The external query is called the outer query.
Common Table Expression (CTE) is a temporary table defined within an SQL statement. Such
temporary tables can be seen as views within the scope of the statement. A CTE is typically
used when a user does not have the necessary privileges for creating a view.

Normalization is a process of organizing the data in database to avoid data redundancy,


insertion anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies
first then we will discuss normal forms with examples.

Physical database design is to specify how database records are stored, accessed, and related in
order to ensure adequate performance of a database application. Physical database design is
related to query processing, physical data organization, indexing, transaction processing, and
concurrency management, among other characteristics.

Transaction throughput is the number of transactions that can be processed in a given time
interval. In some systems, such as electronic payment systems, a high transaction throughput is
critical.

Response time is the elapsed time for the completion of a single transaction. Minimizing
response time is essential from the user’s point of view.

Disk storage is the amount of disk space required to store the database files. However, a
compromise usually has to be made among these factors. From a general perspective, this
compromise implies the following factors:

 Space-time trade-off: It is often possible to reduce the time taken to perform an


operation by using more space, and vice versa.
 Query-update trade-off: Access to data can be made more efficient by imposing some
structure upon it. However, the more elaborate the structure, the more time is taken to
build it and to maintain it when its contents change.
Computer disk in disk blocks (or pages) that are set by the operating system during disk
formatting.

File organization is the physical arrangement of data in a file into records and blocks on
secondary storage. There are three main types of file organization. In a heap (or unordered) file
organization, records are placed in the file in the order in which they are inserted. This makes
insertion very efficient.
Sequential (or ordered) files have their records sorted on the values of one or more fields,
called ordering fields. Ordered files allow fast retrieving of records, provided that the search
condition is based on the sorting attribute. However, inserting and deleting records in a
sequential file are problematic, since the order must be maintained.
Online transaction processing (OLTP) systems OR operational databases This paradigm is
focused on queries, in particular, analytical queries. OLAP oriented databases should support a
heavy query load.

Data cube is defined by dimensions and facts.

Dimensions are perspectives used to analyze the data.


. Hierarchies allow this possibility by defining a sequence of mappings relating lower-level,
detailed concepts to higher-level, more general concepts. Given two related levels in a
hierarchy, the lower level is called the child and the higher level is called the parent. The
hierarchical structure of a dimension is called the dimension schema, while a dimension
instance comprises the members at all levels in a dimension.
Summarizability refers to the correct aggregation of cube measures along dimension
hierarchies, in order to obtain consistent aggregation results. To ensure summarizability, a set
of conditions may hold. Below, we list some of these conditions:

 Disjointness of instances: The grouping of instances in a level with respect to their


parent in the next level must result in disjoint subsets.
 Completeness: All instances must be included in the hierarchy and each instance must
be related to one parent in the next level.
 Correctness: It refers to the correct use of the aggregation functions. As explained next,
measures can be of various types, and this determines the kind of aggregation function
that can be applied to them.
 Additive measures can be meaningfully summarized along all the dimensions, using
addition. These are the most common type of measures.
 Semiadditive measures can be meaningfully summarized using addition along some, but
not all, dimensions.
 Nonadditive measures cannot be meaningfully summarized using addition across any
dimension.
Other Classification of Measures:

 Distributive measures are defined by an aggregation function that can be computed in a


distributed way. If the result derived by applying the function to n aggregate values is
the same as that derived by applying the function on all the data without partitioning.
E.g., count(), sum(), min(), max()
 Algebraic measures are defined by an aggregation function that can be expressed as a
scalar function of distributive ones. If it can be computed by an algebraic function with
M arguments (where M is a bounded integer), each of which is obtained by applying a
distributive aggregate function.
E.g., avg(), min_N(), standard_deviation()
 Holistic measures are measures that cannot be computed from other subaggregates. If
there is no constant bound on the storage size needed to describe a subaggregate.
E.g., median(), mode(), rank()

Alternate key: all candidate keys not chosen as the primary keycandidate key: a simple or composite key
that is unique (no two rows in a table may have the same value) and minimal (every column is
necessary)

Characteristic entities: entities that provide more information about another table

Composite attributes: attributes that consist of a hierarchy of attributes

Composite key: composed of two or more attributes, but it must be minimal

Dependent entities: these entities depend on other tables for their meaning

Derived attributes: attributes that contain values calculated from other attributes

Derived entities: see dependent entities

EID: employee identification (ID)

Entity: a thing or object in the real world with an independent existence that can be differentiated from
other objects
Entity relationship (ER) data model: also called an ER schema, are represented by ER diagrams. These
are well suited to data modelling for use with databases.

Entity relationship schema: see entity relationship data model

Entity set:a collection of entities of an entity type at a point of time

Entity type: a collection of similar entities

Foreign key (FK): an attribute in a table that references the primary key in another table OR it can be
null

Independent entity: as the building blocks of a database, these entities are what other tables are based
on

Kernel: see independent entity

Key: an attribute or group of attributes whose values can be used to uniquely identify an individual
entity in an entity set

Multivalued attributes: attributes that have a set of values for each entity

N-ary: multiple tables in a relationship

null: a special symbol, independent of data type, which means either unknown or inapplicable; it does
not mean zero or blank

Recursive relationship: see unary relationship

Relationships: the associations or interactions between entities; used to connect related information


between tables

Relationship strength:  based on how the primary key of a related entity is defined

Secondary key an attribute used strictly for retrieval purposes 

Simple attributes: drawn from the atomic value domains

SIN: social insurance number

Single-valued attributes: see simple attributes

Stored attribute: saved physically to the database

Ternary relationship: a relationship type that involves many to many relationships between three
tables.

Unary relationship: one in which a relationship exists between occurrences of the same entity set.

You might also like