Database

Active database
1. What is this database application? An active database is a database that includes an event-driven architecture which can respond to conditions both inside and outside the database. Possible uses include security monitoring, alerting, statistics gathering and authorization. Most modern relational databases include active database features in the form of database trigger. 2. What is the architectural design of this database? Active databases endow conventional database functionality with event-based rule processing. The behavior of an active database is accomplished through a set of ECA-rules (Event-Condition-Action rules) related with the database. Once a certain event is detected, the relevant rules will be triggered. The triggering of rules involves such procedures as evaluating a certain condition of the database and executing the corresponding action. An active database gains its power from the array of events it can respond to and the type of actions it can perform in response. Active databases support the creation of triggers which fire when certain operations occur on the database. Most industrial relational databases nowadays are integrated with active database capabilities (e.g. Microsoft SQL Server, Oracle, Postgres, Sybase and Teradata), and all support SQL Triggers. 3. How does it run?
Active database systems support mechanisms that enable them to respond automatically to events that are taking place either inside or outside the database system itself. Considerable effort has been directed towards improving understanding of such systems in recent years, and many different proposals have been made and applications suggested. This high level of activity has not yielded a single agreed-upon standard approach to the integration of active functionality with conventional database systems, but has led to improved understanding of active behavior description languages, execution models, and architectures. This survey presents the fundamental characteristics of active database systems, describes a collection of representative systems within a common framework, considers the consequences for implementations of certain design decisions, and discusses tools for developing active applications. 4. Who are the people involved in this database application? What are their functions? There is no doubt that computers have changed the lives of people. Nowadays almost every kind of task performed on a computer of a kind or another and their popularity keeps on increasing as well as does the development of new and modern technology. There are many things about technology that can make one s life easier and one of them is the database. Databases have been designed with the aim of helping people keep their data organized and provide them with an opportunity to select only the data that they need while not losing any type of information. There are different types of software database but here on can learn more about the active databases. 5. When it can be implemented? An active database system, in contrast, is a database system that monitors situations of interest, and when they occur, triggers an appropriate response in a timely manner. The desired behavior is expressed in production rules (also called event-condition-action rules), which are denied and stored in the database. This has the beneath that the rules can be shared by many application programs, and the database system can optimize their implementation. The production rule paradigm originated in the held of Artificial Intelligence (AI) with expert systems rule languages such as OPS5 6. Why it is needed? Active database systems aim at the representation of more real-world semantics in the database by supporting event-condition-action rules (ECA-rules). ECA-rules can be interpreted as "when the specified event occurs and the condition holds, execute the action." An event indicates the point in time when some sort of reaction is required from the DBMS. For primitive events, this point in time can be specified by an occurrence in the database, by an occurrence in the DBMS, or by an occurrence in the database environment. For composite events, the point in time is defined on the basis of other points in time which represent other primitive and/or composite events (called component events). These components are combined by means of event constructors as: negation, conjunction, disjunction, sequence, etc. The action describes treatments to achieve when a specific event happens and some condition holds. The potential uses of reactive behavior are significant: active rules support data derivations, integrity maintenance, workflow management, replication management, and more. 7. What are the examples of this database application?
y y
SQL'99. CS561.
Cloud database
1. What is this database application A Cloud database is a database that relies on cloud technology. Both the database and most of its DBMS reside remotely, "in the cloud," while its applications are both developed by programmers and later maintained and utilized by (application's) end-users through a Web browser and Open APIs. More and more such database products are emerging, both of new vendors and by virtually all established database vendors. 2. What is the architectural design of this database?
Most database services offer web-based consoles, which the end user can use to provision and configure database instances. For example, the Amazon Web Services web console enables users to launch database instances, create snapshots (similar to backups) of databases, and monitor database statistics.
Database services consist of a database manager component, which controls the underlying database instances using a service API. The service API is exposed to the end user, and permits users to perform maintenance and scaling operations on their database instances. For example, the Amazon Relational Database Service's service API enables creating a database instance, modifying the resources available to a database instance, deleting a database instance, creating a snapshot (similar to a backup) of a database, and restoring a database from a snapshot. Database services take care of scalability and high availability of the database. Scalability features differ between vendors - some offer auto-scaling, others enable the user to scale up using an API, but do not scale automatically. There is typically a commitment for a certain level of high availability (e.g. 99.9% or 99.99%). 3. How does it run? There are two primary methods to run a database on the cloud: Virtual Machine Image - cloud platforms allow users to purchase virtual machine instances for a limited time. It is possible to run a database on these virtual machines. Users can either upload their own machine image with a database installed on it, or use ready-made machine images that already include an optimized installation of a database. For example, Oracle provides a ready-made machine image with an installation of Oracle Database 11g Enterprise Edition on Amazon EC2. Database as a Service - some cloud platforms offer options for using a database as a service, without physically launching a virtual machine instance for the database. In this configuration, application owners do not have to install and maintain the database on their own. Instead, the database service provider takes responsibility for installing and maintaining the database, and application owners pay according to their usage. For example, Amazon Web Services provides two database services as part of its cloud offering, SimpleDB which is a NoSQL key-value store, and Amazon Relational Database Service which is an SQL-based database service with a MySQL interface. A third option is managed database hosting on the cloud, where the database is not offered as a service, but the cloud provider hosts the database and manages it on the application owner's behalf. For example, cloud provider Rackspace offers managed hosting for MySQL databases. 4. Who are the people involved in this database application? What are their functions? To many of us today, the cloud seems like a bit of magic. We often simply use the services of a cloud based system without really thinking about where the cloud is located, or who keeps it running. Ultimately, behind every cloud , there are real people managing real machines. What is marketed as a cloud is really a rack of machines, with a very real person who has to keep them running. To that person; the administrator, the cloud isn t in the cloud , it s in his own data center! The administrator must put together a set of machines, software and administrative tools that enable everything to be viewed in a completely hands-off way by the users, so that they think of it as a cloud . The challenge vendors face when trying to market and sell something for the cloud is that the definition of the cloud is so broad and varied. What passes as the cloud to one person is simply a set of machines to another person. Our new SQL Anywhere OnDemand Edition (code named Fuji ) currently in beta test, is one such product designed to help the administrator of those machines
to create a data cloud. While the administrator certainly will know what machines are in use, where the database servers are running and where the databases are located, the end user will be enabled to simply view their database as being in the cloud . The administrator of the cloud system will use one of the primary components of Fuji: the administrative console. The console is designed to enable an administrator to easily keep track of the various host machines that are part of the system, the SQL Anywhere servers that are running on each host, and the databases being served by each database server. The console also provides access to all the various tasks that an administrator might want to execute on their running cloud; including starting and stopping database servers, adding a new database into the cloud, setting up high-availability for a database, and backup/restore operations. The console is completely web-driven, so it can be accessed using a standard Flash-enabled web browser. 5. Where it can be implemented? SQL Databases, such as Oracle Database, Microsoft SQL Server and MySQL, are one type of database which can be run on the cloud (either as a Virtual Machine Image or as a service, depending on the vendor). SQL databases are difficult to scale, meaning they are not natively suited to a cloud environment, although cloud database services based on SQL are attempting to address this challenge. NoSQL Databases, such as Apache Cassandra, CouchDB and MongoDB, are another type of database which can run on the cloud. NoSQL databases are built to service heavy read/write loads and are able scale up and down easily, and therefore they are more natively suited to running on the cloud. However, most contemporary applications are built around an SQL data model, so working with NoSQL databases often requires a complete rewrite of application code. 6. Why it is needed? It is important to have a database as an organic part of the cloud for one key reason: to avoid dedicated and complex maintenance required to babysit the odd-child in the cloud infrastructure. Any non-cloud service will become some kind of an exception which requires special maintenance, skillset, procedures etc. It s definitely true long-term, but even in their day-to-day operations today, DBAs and System Administrators can attest to how much time and energy (and admin costs) go into monitoring their DB and ensure it plays well and is properly integrated with the other components of the cloud. In addition, non-native cloud services will not enjoy the benefits achieved by natural tenants of the cloud. These benefits include, for example, all the automation, resource-optimization, dynamic networking, and more. Databases should be regarded as integral part of the cloud so that the IT infrastructure can really be a commodity, which can be bought, traded, re-allocated and moved around as needed. On the other side of the equation, to live up to the theory, the cloud database technology itself must deliver a convincing paradigm and proof for its ability to keep the data safe, secured and always available, at least on the same level as enterprise databases today.
7. What are the examples of this database application? y y y y y Oracle Database IBM DB2 Ingres (database) PostgreSQL MySQL
Data warehouse
1. What is this database application Data warehouses archive data from operational databases and often from external sources such as market research firms. Often operational data undergoes transformation on its way into the warehouse, getting summarized, anonymized, reclassified, etc. The warehouse becomes the central source of data for use by managers and other end-users who may not have access to operational data. For example, sales data might be aggregated to weekly totals and converted from internal product codes to use UPCs so that it can be compared with ACNielsen data. Some basic and essential components of data warehousing include retrieving, analyzing, and mining data, transforming,loading and managing data so as to make it available for further use. Operations in a data warehouse are typically concerned with bulk data manipulation, and as such, it is unusual and inefficient to target individual rows for update, insert or delete. Bulk native loaders for input data and bulk SQL passes for aggregation are the norm. 2. What is the architectural design of this database?
Data Warehouse Configurations A Data Warehouse configuration, also known as the logical architecture, includes the following components: one Enterprise Data Store (EDS) - a central repository which supplies atomic (detail level) integrated information to the whole organization. (optional) one Operational Data Store - a "snapshot" of a moment in time's enterprise-wide data (optional) one or more individual Data Mart(s) - summarized subset of the enterprise's data specific to a functional area or department, geographical region, or time period one or more Metadata Store(s) or Repository(ies) - catalog(s) of reference information about the primary data. Metadata is divided into two categories: information for technical use, and information for business end-users.
Bottom-up design Ralph Kimball, a well-known author on data warehousing, is a proponent of an approach to data warehouse design which he describes as bottom-up. In the bottom-up approach data marts are first created to provide reporting and analytical capabilities for specific business processes. Though it is important to note that in Kimball methodology, the bottomup process is the result of an initial business oriented Top-down analysis of the relevant business processes to be modelled. Data marts contain, primarily, dimensions and facts. Facts can contain either atomic data and, if necessary, summarized data. The single data mart often models a specific business area such as "Sales" or "Production." These data marts can eventually be integrated to create a comprehensive data warehouse. The integration of data marts is managed through the implementation of what Kimball calls "a data warehouse bus architecture". The data warehouse bus architecture is primarily an implementation of "the bus", a collection of conformed dimensions and conformed facts, which are dimensions that are shared (in a specific way) between facts in two or more data marts. The integration of the data marts in the data warehouse is centered on the conformed dimensions (residing in "the bus") that define the possible integration "points" between data marts. The actual integration of two or more data marts is then done by a process known as "Drill across". A drill-across works by grouping (summarizing) the data along the keys of the (shared) conformed dimensions of each fact participating in the "drill across" followed by a join on the keys of these grouped (summarized) facts. Maintaining tight management over the data warehouse bus architecture is fundamental to maintaining the integrity of the data warehouse. The most important management task is making sure dimensions among data marts are consistent. In Kimball's words, this means that the dimensions "conform". Some consider it an advantage of the Kimball method, that the data warehouse ends up being "segmented" into a number of logically self contained (up to and including The Bus) and consistent data marts, rather than a big and often complex centralized model. Business value can be returned as quickly as the first data marts can be created, and the method gives itself well to an exploratory and iterative approach to building data warehouses. For example, the data warehousing effort might start in the
"Sales" department, by building a Sales-data mart. Upon completion of the Sales-data mart, the business might then decide to expand the warehousing activities into the, say, "Production department" resulting in a Production data mart. The requirement for the Sales data mart and the Production data mart to be integral, is that they share the same "Bus", that will be, that the data warehousing team has made the effort to identify and implement the conformed dimensions in the bus, and that the individual data marts links that information from the bus. Note that this does not require 100% awareness from the onset of the data warehousing effort, no master plan is required upfront. The Sales-data mart is good as it is (assuming that the bus is complete) and the production data mart can be constructed virtually independent of the sales data mart (but not independent of the Bus). If integration via the bus is achieved, the data warehouse, through its two data marts, will not only be able to deliver the specific information that the individual data marts are designed to do, in this example either "Sales" or "Production" information, but can deliver integrated Sales-Production information, which, often, is of critical business value. An integration (possibly) achieved in a flexible and iterative fashion. Top-down design Bill Inmon, one of the first authors on the subject of data warehousing, has defined a data warehouse as a centralized repository for the entire enterprise. Inmon is one of the leading proponents of the topdown approach to data warehouse design, in which the data warehouse is designed using a normalized enterprise data model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data warehouse. Dimensional data marts containing data needed for specific business processes or specific departments are created from the data warehouse. In the Inmon vision the data warehouse is at the center of the "Corporate Information Factory" (CIF), which provides a logical framework for delivering business intelligence (BI) and business management capabilities. Inmon states that the data warehouse is: Subject-oriented The data in the data warehouse is organized so that all the data elements relating to the same realworld event or object are linked together. Non-volatile Data in the data warehouse are never over-written or deleted read-only, and retained for future reporting.
once committed, the data are static,
Integrated The data warehouse contains data from most or all of an organization's operational systems and these data are made consistent. Time-variant For An operational system, the stored data contains the current value. The top-down design methodology generates highly consistent dimensional views of data across data marts since all data marts are loaded from the centralized repository. Top-down design has also proven to be robust against business changes. Generating new dimensional data marts against the data stored in the data warehouse is a relatively simple task. The main disadvantage to the top-down methodology
is that it represents a very large project with a very broad scope. The up-front cost for implementing a data warehouse using the top-down methodology is significant, and the duration of time from the start of project to the point that end users experience initial benefits can be substantial. In addition, the topdown methodology can be inflexible and unresponsive to changing departmental needs during the implementation phases. Hybrid design Data warehouse (DW) solutions often resemble hub and spoke architecture. Legacy systems feeding the DW/BI solution often include customer relationship management (CRM) and enterprise resource planning solutions (ERP), generating large amounts of data. To consolidate these various data models, and facilitate the extract transform load (ETL) process, DW solutions often make use of an operational data store (ODS). The information from the ODS is then parsed into the actual DW. To reduce data redundancy, larger systems will often store the data in a normalized way. Data marts for specific reports can then be built on top of the DW solution. It is important to note that the DW database in a hybrid solution is kept on third normal form to eliminate data redundancy. A normal relational database however, is not efficient for business intelligence reports where dimensional modeling is prevalent. Small data marts can shop for data from the consolidated warehouse and use the filtered, specific data for the fact tables and dimensions required. The DW effectively provides a single source of information from which the data marts can read, creating a highly flexible solution from a BI point of view. The hybrid architecture allows a DW to be replaced with a master data management solution where operational, not static information could reside. The Data Vault Modeling components follow hub and spoke architecture. This modeling style is a hybrid design, consisting of the best of breed practices from both 3rd normal form and star schema. The Data Vault model is not a true 3rd normal form, and breaks some of the rules that 3NF dictates be followed. It is however, top-down architecture with a bottom up design. The Data Vault model is geared to be strictly a data warehouse. It is not geared to be end-user accessible, which when built, still requires the use of a data mart or star schema based release area for business purposes. 3. How does it run? Implementation Once the Planning and Design stages are complete, the project to implement the current Data Warehouse iteration can proceed quickly. Necessary hardware, software and middleware components are purchased and installed, the development and test environment is established, and the configuration management processes are implemented. Programs are developed to extract, cleanse, transform and load the source data and to periodically refresh the existing data in the Warehouse, and the programs are individually unit tested against a test database with sample source data. Metrics are captured for the load process. The metadata repository is loaded with transformational and business user metadata. Canned production reports are developed and sample ad-hoc queries are run against the test database, and the validity of the output is measured. User access to the data in the Warehouse is established. Once the programs have been developed and unit tested and the components are in place, system functionality and user acceptance testing is conducted for the complete integrated Data Warehouse system. System support processes of database security, system backup and recovery,
system disaster recovery, and data archiving are implemented and tested as the system is prepared for deployment. The final step is to conduct the Production Readiness Review prior to transitioning the Data Warehouse system into production. During this review, the system is evaluated for acceptance by the customer organization. Transition to Production The Transition to Production stage moves the Data Warehouse development project into the production environment. The production database is created, and the extraction/cleanse/transformation routines are run on the operations system source data. The development team works with the Operations staff to perform the initial load of this data to the Warehouse and execute the first refresh cycle. The Operations staff is trained, and the Data Warehouse programs and processes are moved into the production libraries and catalogs. Rollout presentations and tool demonstrations are given to the entire customer community, and end-user training is scheduled and conducted. The Help Desk is established and put into operation. A Service Level Agreement is developed and approved by the customer organization. Finally, the new system is positioned for ongoing maintenance through the establishment of a Change Management Board and the implementation of change control procedures for future development cycles. 4. Who are the people involved in this database application? What are their functions? Data Warehouses and Data Warehouse applications are designed primarily to support executives, senior managers, and business analysts in making complex business decisions. Data Warehouse applications provide the business community with access to accurate, consolidated information from various internal and external sources. The primary objective of Data Warehousing is to bring together information from disparate sources and put the information into a format that is conducive to making business decisions. This objective necessitates a set of activities that are far more complex than just collecting data and reporting against it. Data Warehousing requires both business and technical expertise and involves the following activities: Accurately identifying the business information that must be contained in the Warehouse Identifying and prioritizing subject areas to be included in the Data Warehouse Managing the scope of each subject area which will be implemented into the Warehouse on an iterative basis Developing a scaleable architecture to serve as the Warehouse s technical and application foundation, and identifying and selecting the hardware/software/middleware components to implement it Extracting, cleansing, aggregating, transforming and validating the data to ensure accuracy and consistency Defining the correct level of summarization to support business decision making Establishing a refresh program that is consistent with business needs, timing and cycles Providing user-friendly, powerful tools at the desktop to access the data in the Warehouse Educating the business community about the realm of possibilities that are available to them through Data Warehousing Establishing a Data Warehouse Help Desk and training users to effectively utilize the desktop tools
Establishing processes for maintaining, enhancing, and ensuring the ongoing success and applicability of the Warehouse
5. Where it can be implemented? Some of the applications data warehousing can be used for are: y Decision support y Trend analysis y Financial forecasting y Churn Prediction for Telecom subscribers, Credit Card users etc. y Insurance fraud analysis y Call record analysis y Logistics and Inventory management y Agriculture 6. Why it is needed? A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to: Maintain data history, even if the source transaction systems do not. Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger. Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data. Present the organization's information consistently. Provide a single common data model for all data of interest regardless of the data's source. Restructure the data so that it makes sense to the business users. Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems. Add value to operational business applications, notably customer relationship management (CRM) systems.
7. What are the examples of this database application? y y AdventureWorksDW2008R2 ETL
Distributed Database 1. What is this database application? The definition of a distributed database is broad, and may be utilized in different meanings. In general it typically refers to a modular DBMS architecture that allows distinct DBMS instances to cooperate as a single DBMS over processes, computers, and sites, while managing a single database distributed itself over multiple computers, and different sites.
2. What is the architectural design of this database? A distributed database system allows applications to access data from local and remote databases. In a homogenous distributed database system, each database is an Oracle database. In a heterogeneous distributed database system, at least one of the databases is a non-Oracle database. Distributed databases use client/server architecture to process information requests. A database User accesses the distributed database through: Local applications applications which do not require data from other sites. Global applications Applications which do require data from other sites. A distributed database does not share main memory or disks. 3. How does it run? The First Steps script takes care of setting up database tables for Telecom Web Services Server (TWSS) when you are using a consolidated or shared database. However, additional steps may be required when you are using a distributed database configuration. Before you begin This procedure is a sub-procedure within the overall migration process of your test (non-production) or production system. Make sure that you have already installed TWSS version 7.2 components and that you have performed the procedures described in one of the following post-installation configuration topics: y y Creating and configuring the DB2 database server instance Creating and configuring the Oracle database server instance
Note that the First Steps script assumes that all nodes in the cluster are at a TWSS version 7.2 level. About this task If a distributed or partitioned topology is used for databases, you will encounter one of the following scenarios:
One database per cluster. For example, you may have the Access Gateway running in one cluster and the Service Platform components running in another cluster, with each cluster having its own database. If this is the case, when you run the First Steps script on the Access Gateway cluster, all of the database parameters should refer to the Access Gateway database (typically named AGDB). Then when you run the script on the Service Platform cluster, the database parameters should refer to the Service Platform database (typically SPMDB). No additional special considerations apply. More than one database per cluster. For example, you may have a separate database for the WAP Push service on the same cluster with the database for the Service Platform components. If this is the case, use the procedure in this topic to create a temporary database and then configure new databases to work with TWSS version 7.2.
To migrate a distributed database configuration when there is more than one database per cluster, perform the following steps. Procedure a. On one of the nodes in the cluster, create a temporary database by running the First Steps script with the Initial Configuration Mode option pointing to a new temporary database. For detailed information, see the topic Running the First Steps configuration script. The First Steps script creates the minimum necessary data sources to support a distributed topology and points to the new single (temporary) database instance. b. Create additional data sources, as necessary, and update the necessary JNDI bindings to the respective data sources. c. Drop the temporary database.
d. Verify the new migrated environment by running any successful service logic test case. Verify databases and logs. Note: Migration of runtime data is generally not supported, except in the cases of some Direct Connectbased web service implementations. (For details, refer to the topic Planning to migrate from the previous version of Telecom Web Services Server.) In these cases, be sure that you have copied all of the existing data to the new database before deleting the old database and database tables. e. Verify that the newly populated data coexists with the previous data in cases where migration of runtime data is supported for a given web service or feature. Note: When the First Steps script sets up your configuration for running TWSS version 7.2, it does not modify your existing configuration. Therefore, if you made changes to the default configuration for a previous version of TWSS, your changes are preserved during the migration process. 4. Who are the people involved in this database application? What are their functions? In a distributed database environment, coordinate with the database administrator to determine the best location for the data. Some issues to consider are:
y y y y y y
Number of transactions posted from each location Amount of data (portion of table) used by each node Performance characteristics and reliability of the network Speed of various nodes, capacities of disks Importance of a node or link when it is unavailable Need for referential integrity among tables
5. Where it can be implemented? The explosion of individual databases running on PC platforms can provide new opportunities to heads of departments, but may also pose problems for the organization as a whole. Information that could benefit the entire organization often becomes out of reach for users unable to access it or unaware of its existence. Additionally, because of the cheaper hardware and software used, and generally lower skills of the personnel administering these systems, reliability can be significantly less than with centralized systems. Data inconsistency is another problem that occurs in such an environment, as the same data is stored in many databases with no system for managing the multiple copies. 6. Why it is needed? Business leaders today understand the importance of information as a business resource. With centralized database systems, an organization's information is maintained and controlled by a few highly skilled individuals at one location. Two major factors have led many business users to reject the centralized database model: the natural tendency for humans not to share and the introduction of personal computer (PC) -based DBMSs powerful enough to handle many concurrent users. Armed with such tools, departments and workgroups can easily build their own databases, wresting control of the information resource from the administrators of the organization's central databases and satisfying their natural tendency not to share. The centralized and decentralized models described above both generate major problems for large organizations. Some type of architecture that provides the advantages of both without the drawbacks would be ideal. This architecture should allow decentralized use of data, while providing for database administration that can be performed by personnel with the interests of the whole firm in mind. 7. What are the examples of this database application? Examples are databases of local work-groups and departments at regional offices, branch offices, manufacturing plants and other work sites. These databases can include both segments shared by multiple sites, and segments specific to one site and used only locally in that site. Document-oriented database 1. What is this database application? A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information. Document-oriented databases are
one of the main categories of so-called NoSQL databases and the popularity of the term "documentoriented database" (or "document store") has grown with the use of the term NoSQL itself. 2. What is the architectural design of this database? Document-based Database is not a relational database management system. Instead of storing data in rows and columns, the database manages a collection of JSON documents. The documents in a collection need not share a schema, but retain query abilities via views. Views are defined with aggregate functions and filters are computed in parallel, much like MapReduce. Views are generally stored in the database and their indexes updated continuously, although queries may introduce temporary views. Document-based Database supports a view system using external socket servers and a JSON-based protocol. As a consequence, view servers have been developed in a variety of languages. Document-based Database exposes a RESTful HTTP API and a large number of pre-written clients are available. Additionally, a plugin architecture allows for using different computer languages as the view server such as JavaScript (default), PHP, Ruby, Python and Erlang. Support for other languages can be easily added. Document-based Database design and philosophy borrows heavily from Web architecture and the concepts of resources, methods and representations and can be simplified as the following. It is in use in many software projects and web sites, including Ubuntu, where it is used to synchronize address and bookmark data. Since Version 0.11 Document-based Database supports CommonJS' Module specification. Features Document Storage Document-based Database stores documents in their entirety. You can think of a document as one or more field/value pairs expressed as JSON. Field values can be simple things like strings, numbers, or dates. But you can also use ordered lists and associative maps. Every document in a Document-based Database database has a unique id and there is no required document schema. ACID Semantics
Like many relational database engines, Document-based Database provides ACID semantics. It does this by implementing a form of Multi-Version Concurrency Control (MVCC) not unlike InnoDB or Oracle. That means Document-based Database can handle a high volume of concurrent readers and writers without conflict. Map/Reduce Views and Indexes To provide some structure to the data stored in Document-based Database, you can develop views that are similar to their relational database counterparts. In Document-based Database, each view is constructed by a JavaScript function (server-side JavaScript by using CommonJS and SpiderMonkey) that acts as the Map half of a map/reduce operation. The function takes a document and transforms it into a single value which it returns. The logic in your JavaScript functions can be arbitrarily complex. Since computing a view over a large database can be an expensive operation, Document-based Database can index views and keep those indexes updated as documents are added, removed, or updated. This provides a very powerful indexing mechanism that grants unprecedented control compared to most databases. Distributed Architecture with Replication Document-based Database was designed with bi-direction replication (or synchronization) and off-line operation in mind. That means multiple replicas can have their own copies of the same data, modify it, and then sync those changes at a later time. The biggest gotcha typically associated with this level of flexibility is conflicts. REST API Document-based Database treats all stored items (there are others besides documents) as a resource. All items have a unique URI that gets exposed via HTTP. REST uses the HTTP methods POST, GET, PUT and DELETE for the four basic CRUD (Create, Read, Update, Delete) operations on all resources. HTTP is widely understood, interoperable, scalable and proven technology. A lot of tools, software and hardware, are available to do all sorts of things with HTTP like caching, proxying and load balancing. Eventual Consistency According to the CAP theorem it is impossible for a distributed system to simultaneously provide consistency, availability and partition tolerance guarantees. A distributed system can satisfy any two of these guarantees at the same time, but not all three. Document-based Database guarantees eventual consistency to be able to provide both availability and partition tolerance. 3. How does it run? The central concept of a document-oriented database is the notion of a Document. While each document-oriented database implementation differs on the details of this definition, in general, they all assume documents encapsulate and encode data (or information) in some standard format(s) (or encoding(s)). Encodings in use include XML, YAML, JSON and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).
Documents inside a document-oriented database are similar, in some ways, to records or rows, in relational databases, but they are less rigid. They are not required to adhere to a standard schema nor will they have all the same sections, slots, parts, keys, or the like. For example here's a document: FirstName="Bob", Address="5 Oak St.", Hobby="sailing". Another document could be: FirstName="Jonathan", Address="15 Wanamassa Point Road", Children=[{Name:"Michael",Age:10}, {Name:"Jennifer", Age:8}, {Name:"Samantha", Age:5}, {Name:"Elena", Age:2}]. Both documents have some similar information and some different. Unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty 'fields' in either document (record) in this case. This system allows new information to be added and it doesn't require explicitly stating if other pieces of information are left out. Keys Documents are addressed in the database via a unique key that represents that document. Often, this key is a simple string. In some cases, this string is a URI or path. Regardless, you can use this key to retrieve the document from the database. Typically, the database retains an index on the key such that document retrieval is fast. Retrieval One of the other defining characteristics of a document-oriented database is that, beyond the simple key-document (or key-value) lookup that you can use to retrieve a document, the database will offer an API or query language that will allow you to retrieve documents based on their contents. For example, you may want a query that gets you all the documents with a certain field set to a certain value. The set of query APIs or query language features available, as well as the expected performance of the queries, varies significantly from one implementation to the next. Organization Implementations offer a variety of ways of organizing documents, including notions of y y y y Collections Tags Non-visible Metadata Directory hierarchies
4. Who are the people involved in this database application? What are their functions? People go for databases like CouchDB not because there is a need for schema less data but for the other cool features these databases gives us like replication (master to master), conflict resolution etc., which can be leveraged to have a offline capability for applications. I think this itself is the smell, features like these should not drive one to use NoSQL databases, because the application domain needs the relational data.
Any ways the choice is made (yet to be challenged) and we had to figure out how to model our documents and persist in the database which has the parent/child relationship. CouchDB itself does not care how we store documents its up to the API to manage these relationships. Ektorp provides a way to achieve this. Let s take an example which is quoted in the documentation of Ektorp itself, a BlogPost and a bunch of Comments. What if we want to store a BlogPost and Comment as different documents, can we add comments to the blog post and just by saving blog post does it save all the comments as individual documents ? The answer is yes, its is possible with @DocumentReferences annotation. 5. Where it can be implemented? Name Lotus Notes askSam Apstrata Datawasp Publisher IBM License Proprietary Language Notes RESTfulA PI (unknown) (unknown) (unknown) (unknown) Scalable, highperformance, schemafree, documentoriented database management system platform with server based data storage, fast full text search engine functionality, information ranking for search revelevance and clustering.
askSam Systems Proprietary Apstrata Significant Data Systems Proprietary Proprietary
Clusterpoint
Clusterpoint Ltd.
Free community C++ license /Commercial
Yes
CRX MUMPSDataba se UniVerse UniData Jackrabbit
Day Software
Proprietary Proprietarya nd GNU MUMPS Affero GPL
(unknown) Commonly used in health (unknown) applications. Yes (Beta) Yes (Beta) Java JSON over REST/HTTP with Multi-Version Concurrency (unknown) Yes (there is only RESTful API)
Rocket Software Proprietary Rocket Software Proprietary Apache Software Apache Foundation License Couchbase,Apac Apache he Software License Foundation
CouchDB
Erlang
Name
Publisher
License
Language
Notes Control and ACID propert ies. Uses mapand reduce for views and queries.[4]
RESTfulA PI
FleetDB
FleetDB
MIT License
Clojure
A JSON-based schemafree database optimized for agile development.
(unknown)
MongoDB
10gen, Inc
GNU AGPL v3.0
C, C++, Erlang, Haskell,Java, Javascript, .NET( Fast, document-oriented C# F#, database optimized for PowerShell, etc), highly transient data. Perl, PHP, Python, Ruby, Scala Memory-oriented, fast, key-value database with indexing and querying support. JSON over HTTP
Optional using external tools[7]
GemFire Enterprise
VMWare
Commercial
Java, .NET, C++
Yes
OrientDB
Orient Technologies
Apache License
Java
Yes
RavenDB
RavenDB
commercial or GNU AGPL .NET v3.0
A .NET LINQ-enabled Document Database, focused on providing high performance, transactional, schemaYes less, flexible and scalable NoSQL data store for the .NET and Windows platforms. Key-value store supporting lists and sets with fast, simple and binary-safe protocol. Alpha software.
Redis
BSD License
ANSI C
(unknown)
StrokeDB Terrastore
MIT License Apache License Java
(unknown) (unknown)
JSON/HTTP Built on top of Apache Thrift framework that provides indexing and document storage
ThruDB
BSD License
C++, Java
(unknown)
Name
Publisher
License
Language
Notes services for building and scaling websites. Alternate implementation is being developed in Java. Alpha software.
RESTfulA PI
Persevere
Persevere
BSD License
A JSON database and JavaScript Application Server. Provides RESTful JSON interface for Create, read, update, and Yes delete access to data. Also supports JSONQuery/JSONPath querying. database abstraction layer (over MySQL) used (unknown) by theNew York Times. JSON over HTTP. High performance. Based on Dynamic objects. (unknown) Supports LINQ, SQL queries.
DBSlayer
DBSlayer
Apache License
Eloquera DB
Eloquera
Proprietary
.NET
6. Why it is needed? Utilized to conveniently store, manage, edit and retrieve documents. 7. What are the examples of these database applications? y y y y y Apache CouchDB MongoDb Github Sourceforge IBM Lotus Notes
Embedded database 1. What is this database application? An embedded database system is a DBMS which is tightly integrated with an application software that requires access to stored data in a way that the DBMS is hidden from the application s end-user and requires little or no ongoing maintenance. It is actually a broad technology category that includes
DBMSs with differing properties and target markets. The term "embedded database" can be confusing because only a small subset of embedded database products is used in real-time embedded systems such as telecommunications switches and consumer electronics devices. 2. What is the architectural design of this database? Embedded database system supports many application programming interfaces in several programming languages. The C programming language has the most APIs including the low-level kernel MR Routines, Embedded SQL, MSCALL and ODBC. There are also APIs for C++ and JAVA. The layered architecture design provides levels of system optimization for application development. Applications developed using these APIs may be run in standalone and/or server modes. 3. How does it run? Embedded Database is a full-function, relational database that has been embedded into applications by organizations small to large, with deployment environments including medical systems, network routers, nuclear power plant monitors, satellite management systems, and other embedded system applications that require reliability and power. It is an ACID compliant, SQL database engine with C, C++, Java, JDBC, ODBC, SQL, ADO.NET and kernel level APIs. Applications developed using these APIs may be run in standalone and/or server modes. Embedded Database runs on Linux, Unix, Microsoft Windows and Real-time operating systems. 4. Who are the people involved in this database application? What are their functions? Technology has managed, for quite a while now, to change the ways in which people choose to live their lives. computers in particular and software have developed from some machines that would be able to perform basic tasks that people would probably no longer need anyways, to handling complex, multiple and various information in a minimum amount of time. Databases have moreover contributed to the emergence of a different way to keep one s information not only thoroughly organized but also easily accessible. Databases, although using programming language are in most cases user friendly and one can learn to work with them very fast and without too much effort. This is why the software database has constantly increased in popularity ever since it was firstly developed in the late 1970s. The first ever built database was dBase, the ancient program that helped people manage better their information. Nowadays the database language has evolved and it is available in different types of databases, each potentially used by different people and in different settings. The embedded database is therefore one of them. The embedded database is a type of database system or DBMS which is very closely integrated into an application software that requires access to stored data. This means that the database system is actually hidden from the application that the end user will work with which makes it much easier for individuals who do not have professional training in programming to work with this type of databases. An
embedded database does not require maintenance, although to a very small extent. The term however depicts a very broad technological system which may include different application programming interfaces such as SQL, database architecture, storage modes or database models as well as target markets. This type of database, although very complex and complicated, makes it easier for the end user to work with or handle data. 5. Where it can be implemented? Major embedded database products include, in alphabetical order: Advantage Database Server from Sybase Inc., Berkeley DB from Oracle Corporation, CSQL from csqlcache.com, EffiPRoz from EffiProz Systems , ElevateDB from Elevate Software, Inc. , Empress Embedded Database from Empress Software , Extensible Storage Engine from Microsoft, eXtremeDB from McObject, Firebird Embedded, HSQLDB from HSQLDB.ORG, Informix Dynamic Server (IDS) from IBM, InnoDB from Oracle Corporation, ITTIA DB from ITTIA , RDM Embedded and RDM Server from Raima Inc. , SolidDB from IBM, SQLite, SQL Server Compact from Microsoft Corporation, Valentina DB from Paradigma Software, and VistaDB from VistaDB Software, Inc. 6. Why it is needed? Typically, when data is returned to the user, it must be copied from the data manager's buffer cache (or data page) into the application's memory. However, in an embedded environment, the robustness of the total software package is of paramount importance, not the isolation between the application and the data manager. As a result, it is possible for the data manager to avoid copies by giving applications direct references to data items in a shared memory cache. This is a significant performance optimization that can be allowed when the application and data manager are tightly integrated. 7. What are the examples of these database applications? y MySQL
End-user database 1. What is this database application? These databases consist of data developed by individual end-users. Examples of these are collections of documents, spreadsheets, presentations, multimedia, and other files. Several products exist to support such databases. Some of them are much simpler than full fledged DBMSs, with more elementary DBMS functionality (e.g., not supporting multiple concurrent end-users on a same database), with basic programming interfaces, and a relatively small "foot-print" (not much code to run as in "regular" general-purpose databases). However, also available general-purpose DBMSs can often be used for such purpose, if they provide basic user-interfaces for straightforward database applications (limited query and data display; no real programming needed), while still enjoying the database qualities and protections that these DBMSs can provide. 2. What is the architectural design of this database?
The number, nature, and needs of the tenants you expect to serve all affect your data architecture decision in different ways. Some of the following questions may bias you toward a more isolated approach, while others may bias you toward a more shared approach. y How many prospective tenants do you expect to target? You may be nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be, the more likely you will want to consider a more shared approach.
y y
How much storage space do you expect the average tenant's data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach later on.) How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate a more isolated approach will be to meet end-user requirements. Do you expect to offer any per-tenant value-added services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.
3. How does it run? If you continually gauge database performance from a strictly database-internal point of view, you are missing the boat. Begin watching your database end users and performance will begin to take on a whole new meaning. 4. Who are the people involved in this database application? What are their functions?
From a Discoverer end user's point of view, the ability to schedule workbooks is useful for: y y reports that take a long time to run reports that have to run at regular intervals
For example, a Discoverer end user might want to run a report that they know will take a long time to complete. The user can schedule the report to run overnight and have the results ready to view the next morning. From a Discoverer manager's point of view, workbook scheduling is useful to prevent long-running queries from adversely affecting system performance. You can force users to schedule workbooks (either all workbooks, or only those workbooks that will exceed a predicted time that you specify), and you can further specify the time periods that scheduled workbooks are permitted to run. 5. Where it can be implemented? In consultation with all potential users of the database, a database designer's first step is to draw up a data requirements document. The requirements document contains a concise and non-technical summary of what data items will be stored in the database, and how the various data items relate to one another. Taking the 'data requirements document', further analysis is done to give meaning to the data items, e.g. define the more detailed attributes of the data and define constraints if needed. The result of this analysis is a 'preliminary specifications' document (Batini et al., 1986). Taking the specifications document, the database designer models how the information is viewed by the database system and is how it is processed and conveyed to the end user. In the implementation design phase, the conceptual design is translated into a more low-level, DBMS specific design. 6. Why it is needed? Lack of familiarity with database design methods could prevent many end users from effectively implementing their database management system packages. An inexpensive solution would be for end users to learn required database design skills from software tutors tailored to their needs. This research describes two tutors developed to teach these skills to end users. The tutors were based on a modified Entity-Relationship database design method. They improved an end user's natural learning process by incorporating design principles and facilitators. Empirical comparison of the tutors tested the teaching effectiveness of the facilitators. The results lead to recommendations for closing the gap between skills required and skills learned by end users in database design. Development of tutors that teach specific database design skills irrespective of the software package used in implementation has important implications for practitioners and researchers. 7. What are the examples of these database applications? y Oracle
Federated database and multi-database 1. What is this database application?
A federated database is an integrated database that comprises several distinct databases, each with its own DBMS. It is handled as a single database by a federated database management system (FDBMS), which transparently integrates multiple autonomous DBMSs, possibly of different types (which makes it a heterogeneous database), and provides them with an integrated conceptual view. The constituent databases are interconnected via computer network, and may be geographically decentralized. Sometime the term multi-database is used as a synonym to federated database, though it may refer to a less integrated (e.g., without an FDBMS and a managed integrated schema) group of databases that cooperate in a single application. In this case typically middleware for distribution is used which typically includes an atomic commit protocol (ACP), e.g., the two-phase commit protocol, to allow distributed (global) transactions (vs. local transactions confined to a single DBMS) across the participating databases. 2. What is the architectural design of this database?
All systems need to evolve over time. In a federated system, new sources may be needed to meet the changing needs of the users' business. IBM makes it easy to add new sources. The federated database engine accesses sources via a software component know as a wrapper. Accessing a new type of data source is done by acquiring or creating a wrapper for that source. The wrapper architecture enables the creation of new wrappers. Once a wrapper exists, simple data definition (DDL) statements allow sources to be dynamically added to the federation without stopping ongoing queries or transactions. Any data source can be wrapped. IBM supports the ANSI SQL/MED standard (MED stands for Management of External Data). This standard documents the protocols used by a federated server to communicate with external data sources. Any wrapper written to the SQL/MED interface can be used with IBM's federated database. Thus wrappers can be written by third parties as well as IBM, and used in conjunction with IBM's federated database. 3. How does it run? Composite data virtualization lets you easily build and run federated views. With Composite you can create a reusable federated view to model, access, combine, federate, and deliver data from multiple relational and non-relational sources. The Composite Studio, with it s easy-to-learn, point-and-click development environment and automated code generation tools, greatly simplifies federated view building. With Composite provided APIs, you can include data from multiple relational databases, a
wide variety of files including Excel and other formats, application data such as from SAP, and even XML sources without worrying about difficult connections, transforms, or other barriers. The Composite Information Server stores these views, making them available at runtime to mulitple consuming applications such as BI or portals via popular standards including JDBC, ODBC, and ADO.NET. When run, Composite optimizes the query across all the sources required, leveraging source system resources and myriad other optimization techniques, to acheive performance levels unmatched by our competitors. 4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? Fundamental to the difference between an MDBS and an FDBS is the concept of autonomy. It is important to understand the aspects of autonomy for component databases and how they can be addressed when a component DBS participates in an FDBS. There are four kinds of autonomies addressed Design Autonomy which refers to ability to choose its design irrespective of data, query language or conceptualization, functionality of the system implementation. Heterogeneities in an FDBS are primarily due to design autonomy. Communication autonomy refers to the general operation of the DBMS to communicate with other DBMS or not. Execution autonomy allows a component DBMS to control the operations requested by local and external operations. Association autonomy gives a power to component DBS to disassociate itself from a federation which means FDBS can operate independently of any single DBS.

The ANSI/X3/SPARC Study Group outlined a three level data description architecture, the components of which are the conceptual schema, internal schema and external schema of databases. The three level
architecture is however inadequate to describing the architectures of an FDBS. It was therefore extended to support the three dimensions of the FDBS namely Distribution, Autonomy and Heterogeneity. The five level schema architecture is explained below. 6. Why it is needed? The five level schema architecture includes the following: y y Local Schema is the conceptual concept expressed in primary data model of component DBMS. Component Schema is derived by translating local schema into a model called the canonical data model or common data model. They are useful when semantics missed in local schema are incorporated in the component. They help in integration of data for tightly coupled FDBS. Export Schema represents a subset of a component schema that is available to the FDBS. It may include access control information regarding its use by specific federation user. The export schema help in managing flow of control of data. Federated Schema is an integration of multiple export schema. It includes information on data distribution that is generated when integrating export schemas. External Schema defines a schema for a user/applications or a class of users/applications.
y y
While accurately representing the state of the art in data integration, the Five Level Schema Architecture above does suffer from a major drawback, namely IT imposed look and feel. Modern data users demand control over how data is presented; their needs are somewhat in conflict with such bottom-up approaches to data integration. 7. What are the examples of these database applications? y y IBM OBIEE
Graph database 1. What is this database application? A graph database is a kind of NoSQL database that uses graph structures with nodes, edges, and properties to represent and store information. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases. 2. What is the architectural design of this database? Graph database is a conference that is organized by the community, for the community.The result is a high quality conference experience where a tremendous amount of attention and investment has gone into having the best content on the most important topics presented by the leaders in our community. Graph database is designed with the technical depth and enterprise focus of interest to technical team leads, architects, and project managers. 3. How does it run?
With the success of Neo4j as a graph database in the NoSQL revolution, it's interesting to see another graph database, HyperGraphDB , in the mix. Their quick blurb on HyperGraphDB says it is a: general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes. From the NoSQL Archive the summary on HyperGraphDB is: API: Java (and Java Langs), Written in:Java, Query Method: Java or P2P, Replication: P2P, Concurrency: STM, Misc: Open-Source, Especially for AI and Semantic Web. 4. Who are the people involved in this database application? What are their functions?
1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS.
3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? The following is a list of several well-known graph database projects: y y y y y y y y y y y y y y y y y y y AllegroGraph - a scalable, high-performance RDF and graph database. Bigdata - a highly scalable RDF/graph database capable of 10B+ edges on a single node or clustered deployment for very high throughput. CloudGraph - a disk- and memory-based, fully transactional .NET graph database that uses graphs and key/value pairs to store data. Cytoscape - open-source platform, outgrowth of bioinformatics DEX - A high-performance graph database from Sparsity Technologies, a technology transition company from DAMA-UPC Filament - graph persistence framework and associated toolkits based on a navigational query style. GraphBase - a customizable, distributed, small-footprint, high-performance graph store with a rich tool set from FactNexus Graphd, the proprietary backend of Freebase Horton - a graph database from Microsoft Research Extreme Computing Group (XCG) based on the cloud programming infrastructure Orleans HyperGraphDB - an open-source (LGPL) graph database supporting generalized hypergraphs where edges can point to other edges InfiniteGraph - a highly scalable, distributed and cloud-enabled commercial product with flexible licensing for startups. InfoGrid - an open-source / commercial (AGPLv3, free for small entities) graph database with web front end and configurable storage engines (MySQL, PostgreSQL, Files, Hadoop) Neo4j - an open-source / commercial (GPLv3 community edition, AGPLv3 advanced and enterprise edition) graph database OrientDB - a high-performance open source document-graph database OQGRAPH - Graph computation engine (GPLv2 licensed) for MySQL, MariaDB and Drizzle sones GraphDB - an open-source / commercial (AGPLv3) graph database and universal access layer (funded by Deutsche Telekom AG) VertexDB - high performance graph database server that supports automatic garbage collection. Virtuoso Universal Server - a clustered high performance and scalable RDF graph database server R2DF - R2DF framework for ranked path queries over weighted RDF graphs
6. Why it is needed? Compared with relational databases, graph databases are often faster for associative data sets, and map more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to manage ad-hoc and changing data with evolving schemas.
Conversely, relational databases are typically faster at performing the same operation on large numbers of data elements. Graph databases are a powerful tool for graph-like queries, for example computing the shortest path between two nodes in the graph. Other graph-like queries can be performed over a graph database in a natural way (for example graph's diameter computations or community detection). 7. What are the examples of these database applications? y y y OQGRAPH VertexDB R2DF
Hypermedia databases 1. What is this database application? The World Wide Web can be thought of as a database, albeit one spread across millions of independent computing systems. Web browsers "process" this data one page at a time, while Web crawlers and other software provide the equivalent of database indexes to support search and other activities. An extendable hypermedia system has a hypermedia data base for storing a hypermedia data, a data processor for executing modification of the data stored in the hypermedia data base and also for executing addition of the type of the associated data, a view processor for executing modification of a view corresponding to the data processed by the data processor and also for executing addition of the type of the associated view, a display for providing a data display, and a window system for executing display of the data corresponding to the view processed by the view processor on the display and also for informing the view processor of a data entered on the display. In the hypermedia system a programmer can arbitrarily add both data type and view type.
2. What is the architectural design of this database? Hypermedia may be developed a number of ways. Any programming tool can be used to write programs that link data from internal variables and nodes for external data files. Multimedia development software such as Adobe Flash, Adobe Director, Macromedia Authorware, and MatchWare Mediator may be used to create stand-alone hypermedia applications, with emphasis on entertainment content. Some database software such as Visual FoxPro and FileMaker Developer may be used to develop stand-alone hypermedia applications, with emphasis on educational and business content management. 3. How does it run? Hypermedia applications may be developed on embedded devices for the mobile and the Digital signage industries using the Scalable Vector Graphics (SVG) specification from W3C (World Wide Web Consortium). Software applications such as Ikivo Animator and Inkscape simplify the development of Hypermedia content based on SVG. Embedded devices such as iPhone natively support SVG specifications and may be used to create mobile and distributed Hypermedia applications. 4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? Hyperlinks may also be added to data files using most business software via the limited scripting and hyperlinking features built in. Documentation software such as the Microsoft Office Suite and LibreOffice allow for hypertext links to other content within the same file, other external files, and URL links to files on external file servers. For more emphasis on graphics and page layout, hyperlinks may be added using most modern desktop publishing tools. This includes presentation programs, such as Microsoft Powerpoint and LibreOffice Impress, add-ons to print layout programs such as Quark Immedia, and tools to include hyperlinks in PDF documents such as Adobe InDesign for creating and Adobe Acrobat for editing. Hyper Publish is a tool specifically designed and optimized for hypermedia
and hypertext management. Any HTML Editor may be used to build HTML files, accessible by any web browser. CD/DVD authoring tools such as DVD Studio Pro may be used to hyperlink the content of DVDs for DVD players or web links when the disc is played on a personal computer connected to the internet. 6. Why it is needed? The development and subsequent rapid advance of electronic computers in the second half of the twentieth century led to the development of database models that are far more efficient for dealing with large volumes of information than flat databases. The most notable is the relational model, which was proposed by E. F. Codd in 1970. Codd, a researcher at IBM, criticized existing data models for their inability to distinguish between the abstract descriptions of data structures and descriptions of the physical access mechanisms. A relational database is a way of organizing data such that it appears to the user to be stored in a series of interrelated tables. Interest in this model was initially confined to academia, perhaps because the theoretical basis is not easy to understand, and thus the first commercial products, Oracle and DB2, did not appear until around 1980. Subsequently, relational databases became the dominant type for high performance applications because of their efficiency, ease of use, and ability to perform a variety of useful tasks that had not been originally envisioned. 7. What are the examples of these database applications? y y Internet Inkscape
In-memory database 1. What is this database application? An in-memory database (IMDB; also main memory database or MMDB) is a database that primarily resides in main memory, but typically backed-up by non-volatile computer data storage. Main memory databases are faster than disk databases. Accessing data in memory reduces the I/O reading activity when, for example, querying the data. In applications where response time is critical, such as telecommunications network equipment, main memory databases are often used. An in-memory database is a database that runs entirely in main memory, without touching a disk. Often they run as an embedded database: created when a process starts, running embedded within that process, and is destroyed when the process finishes.
2. What is the architectural design of this database? There are two routes people seem to take to a in-memory database for testing. The first one is to use a SQL in-memory database library. In Java-land the popular one seems to be HSQLDB. Elsewhere SQLite and Firebird come up. The nice thing about these tools is that they allow you to use regular SQL to query them. One issue is that they may not support quite the same dialects or have all the features of the target database. You can do something similar by running a filebased database on a ram disk, which allows you to keep the test and production deployments closer to each other. Another route is to abstract all the database access behind a Repository. Then you can swap out the database with regular in-memory data structures. Often just a bunch of hash-tables for the entry points to the object graph is enough. One of the strengths of the repository approach is that it gives you a consistent way to access (and stub out) non SQL data sources too. This means that your object-relational mapping system is also hidden inside the repository. 3. How does it run? While most people think of databases as large disk-centered creatures, there's a small but busy world of in-memory databases out there. There are applications which need fast access to some sort of managed data which doesn't need to be persisted either because it doesn't change, or it can be reconstructed (imagine a routing table in a router, or an EventPoster.) Yet even developers of traditional database systems can find an in-memory database useful, particularly for testing. When you're developing an enterprise application, tests that hit the database can be a huge time drain when running your test suites. Switching to an in-memory database can have an order of magnitude effect which can dramatically reduce build times. Since most ThoughtWorkers get the shakes if they haven't had a green bar recently, this makes a big difference to us. Storing and manipulating data in main memory. An IMDB usually features a strict memory-based architecture and direct data manipulation. 4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported
by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? Indeed a few people actively dislike using SQL in-memory databases under the belief that they encourage spreading either SQL or object-relational mapper code around the domain model. Running SQL in-memory may removes much of the pain of slow access but acts as a deodorant to cover the smell of a missing repository. 6. Why it is needed? Testing is the main driver thus far, but I think there's more to come from in-memory databases. Memory sizes are now enough that many application databases can be loaded into memory. If you use an approach that keeps an event log of all changes to your application state, you can treat the in-memory database as a cache of the result of applying the log, rebuilding it and snapshotting it as you need. Such styles can be very scalable and have high performance in cases where you have lots of readers and few writers. I've run into a few cases where people have used in-memory databases for very high performance applications. A difference here is that these experiences tend to be with niche commercial databases while for testing people seem to prefer open-source. Prevayler got a lot of attention for taking this kind of approach. People I know who tried it found it's tight coupling to the in-memory objects and lack of migration tools caused serious problems. But I think the approach of persistent change logs as systems of record is a fertile ground to explore in the future. 7. What are the examples of these database applications? y y y y y y y y Java DB 10.5.1.1 ASE Berkeley DB Adaptive Server Enterprise (ASE) 15.5 Apache Derby Altibase CSQL BlackRay
y y
Eloquera eXtremeDB
Knowledge base database 1. What is this database application? A knowledge base is a special kind of database for knowledge management, providing the means for the computerized collection, organization, and retrieval of knowledge. Also a collection of data representing problems with their solutions and related experiences. 2. What is the architectural design of this database? Knowledge base database supports 64-bit file I/O to allow use of files larger than 4 gigabytes (GB). In addition, physical and logical raw files are supported as data, log, and control files to support Knowledge base database Real Application Clusters (RAC) on Windows and for those cases where performance needs to be maximized. With Knowledge base database 11g Release 1 (11.1), instead of using the operating system kernel NFS client, you can configure Knowledge base database to access NFS V3 servers directly using an Knowledge base internal Direct NFS client. Through this integration, it is able to optimize the I/O path between itself and the NFS server providing significantly superior performance. In addition, Direct NFS client simplifies and optimizes the NFS client configuration for database workloads. Knowledge base Direct NFS Client currently supports up to four parallel network paths to provide scalability and high availability. Direct NFS Client delivers optimized performance by automatically load balancing requests across all specified paths. If one network path fails, then Direct NFS Client will reissue commands over any remaining paths ensuring fault tolerance and high availability. 3. How does it run? A knowledge-based system (KBS) is a system that uses artificial intelligence techniques in problemsolving processes to support human decision-making, learning, and action. It assumes basic computer science skills and a math background that includes set theory, relations, elementary probability, and introductory concepts of artificial intelligence. Each of the 12 chapters is designed to be modular, providing instructors with the flexibility to model the book to their own course needs. Exercises are incorporated throughout the text to highlight certain aspects of the material presented and to simulate thought and discussion. A comprehensive text and resource, Knowledge-Based Systems provides access
to the most current information in KBS and new artificial intelligences, as well as neural networks, fuzzy logic, genetic algorithms, and soft systems. Knowledge bases are essentially closed or open information repositories and can be categorised under two main headings: y Machine-readable knowledge bases store knowledge in a computer-readable form, usually for the purpose of having automated deductive reasoning applied to them. They contain a set of data, often in the form of rules that describe the knowledge in a logically consistent manner. An ontology can define the structure of stored data - what types of entities are recorded and what their relationships are. Logical operators, such as And (conjunction), Or (disjunction), material implication and negation may be used to build it up from simpler pieces of information. Consequently, classical deduction can be used to reason about the knowledge in the knowledge base. Some machine-readable knowledge bases are used with artificial intelligence, for example as part of an expert system that focuses on a domain like prescription drugs or customs law. Such knowledge bases are also used by the semantic web. Human-readable knowledge bases are designed to allow people to retrieve and use the knowledge they contain. They are commonly used to complement a help desk or for sharing information among employees within an organization. They might store troubleshooting information, articles, white papers, user manuals, knowledge tags, or answers to frequently asked questions. Typically, a search engine is used to locate information in the system, or users may browse through a classification scheme.
A text based system that can include groups of documents including hyperlinks between them is known as Hypertext Systems. Hypertext systems support the decision process by relieving the user of the significant effort it takes to relate and remember things." Knowledge bases can exist on both computers and mobile phones in a hypertext format. y Knowledge base analysis and design (also known as KBAD) is an approach that allows people to conduct analysis and design in a way that results in a knowledge base, which can later be used to make informative decisions. This approach was first implemented by Dr. Steven H. Dam
4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS.
3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? Electronic commerce (e-commerce) has been sweeping the globe. For a long time, information technology was mainly passive; users went online and downloaded data. Today, information technology offers interactive and communication functions, which makes e-commerce more than a fad. Now, it has become a part of the way people live their lives. Dynamic Web sites contain Web pages that display constantly changing content; a technique that is an important foundation of e-commerce. There are two ways to achieve dynamic content generation: programmatic content generation, and template-based content generation. Java servlets falls into the first category, while JavaServer Pages (JSP) belongs to the second category. JSP is a simple but powerful technology used to generate dynamic HTML on the server side. They are a direct extension of Java servlets and provide a way to separate content generation from content presentation. In this project, Tomcat was adopted to be the JSP engine and this thesis designs an online health consulting and shopping center that can provide users personal dietary assessment, nutritional news, professional knowledge of nutrition, special diets for patients, the posting of questions and shopping for nutritional supply products. On the other hand, the project connects to a Microsoft Access database using a type 1 JDBC-ODBC Bridge, plus ODBC driver. In this way, staff can easily manage different kind of information in the database of this health center. In short, JSP is more convenient to write by using conventional HTML writing tools and easier to modify because only the dynamic parts need to be changed when updating a web page. Furthermore, JSP with Java is more flexible because they are platform independent. 6. Why it is needed? Ideal for advanced-undergraduate and graduate students, as well as business professionals, this text is designed to help users develop an appreciation of KBS and their architecture and understand a broad variety of knowledge-based techniques for decision support and planning. In general, a knowledge base is a centralized repository for information: a public library, a database of related information about a particular subject, and whatis.com could all be considered to be examples of knowledge bases. In relation to tnformation technology (IT), a knowledge base is a machine-readable resource for the dissemination of information, generally online or with the capacity to be put online. An integral component of knowledge management systems, a knowledge base is used to optimize information collection, organization, and retrieval for an organization, or for the general public. A well-organized knowledge base can save an enterprise money by decreasing the amount of employee time spent trying to find information about - among myriad possibilities - tax laws or company policies and procedures. As a customer relationship management (CRM) tool, a knowledge base can give customers easy access to information that would otherwise require contact with an organization's staff; as a rule, this capacity should make the interaction simpler for both the customer and the organization. A number of software applications are available that allow users to create their own knowledge bases, either separately (these are usually called knowledge management software) or as part of another application, such as a CRM package.
In general, a knowledge base is not a static collection of information, but a dynamic resource that may itself have the capacity to learn, as part of an artificial intelligence (AI) expert system, for example. According to the World Wide Web Consortium (W3C), in the future the Internet may become a vast and complex global knowledge base known as the Semantic Web. 7. What are the examples of these database applications? y PaperCut KB
Operational database 1. What is this database application? In Data Warehousing, the Operational Database is one which is accessed by an Operational System to carry out regular operations of an organization. Operational Databases usually use an OLTP Database which is optimized for faster transaction processing: Inserting, Deleting, and Updating data. On the other side, Data Warehouses use an OLAP Database (Online Analytical Processing) which is optimized for faster queries. An Operational Database is usually put on a separate machine from the Data Warehouse to increase performance. These databases store detailed data about the operations of an organization. They are typically organized by subject matter, process relatively high volumes of updates using transactions. Essentially every major organization on earth uses such databases. Examples include customer databases that record contact, credit, and demographic information about a business' customers, personnel databases that hold information such as salary, benefits, skills data about employees, Enterprise resource planning that record details about product components, parts inventory, and financial databases that keep track of the organization's money, accounting and financial dealings. There are several reasons for this one of the most obvious reasons is that table scans need to reference more pages of data so it could give results. Indexes can also grow in size so it could support larger data volumes and with this increase, access by the index could degrade as there would be more levels that need to be traversed. Some IT professionals address this problem by having solutions that offload older data to data stores for archive. Operational databases are just part of the entire enterprise data management and some of the data that need to be archived go directly to the data warehouse. 2. What is the architectural design of this database? An operational database contains enterprise data which are up to date and modifiable. In an enterprise data management system, an operational database could be said to be an opposite counterpart of a decision support
database which contain non-modifiable data that are extracted for the purpose of statistical analysis. An example use of a decision support database is that it provides data so that the average salary of many different kinds of workers can be determined while the operational database contains the same data which would be used to calculate the amount for pay checks of the workers depending on the number of days that they have reported in any given period of time. An operational database, as the name implies, is the database that is currently and progressive in use capturing real time data and supplying data for real time computations and other analyzing processes. For example, an operational database is the one which used for taking order and fulfilling them in a store whether it is a traditional store or an online store. Other areas in business that use an operational database is in a catalog fulfillment system any other Point of Sale system which is used in retail stores. An operational database is used for keeping track of payments and inventory. It takes information and amounts from credit cards and accountants use the operational database because it must balance up to the last penny. 3. How does it run? An operational database is also used for supported IRS task filings and regulations which is why it is sometimes managed by the IT for the finance and operations groups in a business organization. Companies can seldom ran successfully without using an operational database as this database is based on accounts and transactions. Because of the very dynamic nature of an operational database, there are certain issues that need to be addressed appropriately. An operational database can grow very fast in size and bulk so database administrations and IT analysts must purchase high powered computer hardware and top notch database management systems. Most business organizations have regulations and requirements that dictate storing data for longer periods of time for operation. This can even create more complex setup in relation to database performance and usability. With ever increasing or expanding operational data volume, operational databases will have additional stress on processing of transactions leading to slowing down of things. As a general trend, the more data there are in the operational database, the less efficient the transactions running against the database tend to be. 4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate
product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? Operational Databases are very important to a business. These databases allow a business to enter, gather, and retrieve specific company information. Operational databases can be known by another name production database. Known by different names, users can misunderstand what the database is supposed to be used for within a business. For instance, transaction database could mean that the information stored focuses on financial information even though it may not. Operational databases can store different types of information such as training status, personal employee information, and previous proposal information. Storing information in a centralized area can increase retrieval time for users. Operational databases are important when information is needed quickly. An important feature of storing information in an operational database is the ability to share information across the company. Another feature of an operational database is how much information can be stored that pertains to a business. Depending on the type of operational database being used will determine how much information it can hold. For instance, Oracle can store larger amounts of information than Access. Operational databases need continuous management. Since day-to-day information is important to a business, the management of that information becomes just as important. Having someone continually monitor the information being input into the database will make the information retrieved even more valuable because it will be accurate. Users depend on the accuracy of this information. Operational databases have the ability to flag specific information that may need to be retrieved on a continuous basis. Operational databases also have other features that focus on the business environment. For instance, an operational database has the ability to be modified. The overall idea of using an operational database is to expedite the retrieval of large amounts of information with peak efficiency. Furthermore, provide simultaneous read/write requests through pre-defined queries. 6. Why it is needed? Operational databases can store different types of information such as training status, personal employee information, and previous proposal information. Storing information in a centralized area can increase retrieval time for users. Operational databases have the ability to flag specific information that may need to be retrieved on a continuous basis. Operational databases also have other features that focus on the business environment. For instance, an operational database has the ability to be modified. The overall idea of using an operational database is to expedite the retrieval of large amounts of information with peak efficiency. Furthermore, provide simultaneous read/write requests through pre-defined queries. Operational databases can store different types of information such as training status, personal employee information, and previous proposal information. Storing information in a centralized area can increase retrieval time for users.
Operational databases have the ability to flag specific information that may need to be retrieved on a continuous basis. Operational databases also have other features that focus on the business environment. For instance, an operational database has the ability to be modified. The overall idea of using an operational database is to expedite the retrieval of large amounts of information with peak efficiency. Furthermore, provide simultaneous read/write requests through pre-defined queries. Operational Database is the database-of-record, consisting of system-specific reference data and event data belonging to a transaction-update system. It may also contain system control data such as indicators, flags, and counters. The operational database is the source of data for the data warehouse. It contains detailed data used to run the day-to-day operations of the business. The data continually changes as updates are made, and reflect the current value of the last transaction. 7. What are the examples of these database applications? y MS Share Point Server
Parallel database 1. What is this database application? A parallel database, run by a parallel DBMS, seeks to improve performance through parallelization for tasks such as loading data, building indexes and evaluating queries. Parallel databases improve processing and input/output speeds by using multiple central processing units (CPUs) (including multicore processors) and storage in parallel. In parallel processing, many operations are performed simultaneously, as opposed to serial, sequential processing, where operations are performed with no time overlap. The major parallel DBMS architectures (which are induced by the underlying hardware architecture are: Shared memory architecture, where multiple processors share the main memory space, as well as other data storage. A parallel database system seeks to improve performance through parallelization of various operations, such as loading data, building indexes and evaluating queries. Although data may be stored in a distributed fashion, the distribution is governed solely by performance considerations. Parallel databases improve processing and input/ouput speeds by using multiple CPUs and disks in parallel. Centralized and client server database systems are not powerful enough to handle such applications. In parallel processing, many operations are performed simultaneously, as opposed to serial processing, in which the computational steps are performed sequentially. Shared disk architecture, where each processing unit (typically consisting of multiple processors) has its own main memory, but all units share the other storage. Shared nothing architecture, where each processing unit has its own main memory and other storage.
2. What is the architectural design of this database?
A parallel processing system has the following characteristics: y y y Each processor in a system can perform tasks concurrently. Tasks may need to be synchronized. Nodes usually share resources, such as data, disks, and other devices.
3. How does it run? Using this method: y y Shared memory architecture, where multiple processors share the main memory space, as well as mass storage (e.g. hard disk drives). Shared disk architecture, where each node has its own main memory, but all nodes share mass storage, usually a storage area network. In practice, each node usually also has multiple processors. Shared nothing architecture, where each node has its own mass storage as well as main memory.
Parallel processing divides a large task into many smaller tasks, and executes the smaller tasks concurrently on several nodes. As a result, the larger task completes more quickly.
Note: A node is a separate processor, often on a separate machine. Multiple processors, however, can reside on a single machine. Some tasks can be effectively divided, and thus are good candidates for parallel processing. Other tasks, however, do not lend themselves to this approach. For example, in a bank with only one teller, all customers must form a single queue to be served. With two tellers, the task can be effectively split so that customers form two queues and are served twice as fast-or they can form a single queue to provide fairness. This is an instance in which parallel processing is an effective solution. By contrast, if the bank manager must approve all loan requests, parallel processing will not necessarily speed up the flow of loans. No matter how many tellers are available to process loans, all the requests must form a single queue for bank manager approval. No amount of parallel processing can overcome this built-in bottleneck to the system. 4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? Parallel database software is often specialized-usually to serve as query processors. Since they are designed to serve a single function, however, specialized servers do not provide a common foundation for integrated operations. These include online decision support, batch reporting, data warehousing, OLTP, distributed operations, and high availability systems. Specialized servers have been used most successfully in the area of very large databases: in DSS applications, for example. Versatile parallel database software should offer excellent price/performance on open systems hardware, and be designed to serve a wide variety of enterprise computing needs. Features such as online backup, data replication, portability, interoperability, and support for a wide variety of client tools
can enable a parallel server to support application integration, distributed operations, and mixed application workloads. 6. Why it is needed? A variety of hardware architectures allow multiple computers to share access to data, software, or peripheral devices. A parallel database is designed to take advantage of such architectures by running multiple instances which "share" a single physical database. In appropriate applications, a parallel server can allow access to a single database by users on multiple machines, with increased performance. A parallel server processes transactions in parallel by servicing a stream of transactions using multiple CPUs on different nodes, where each CPU processes an entire transaction. Using parallel data manipulation language you can have one transaction being performed by multiple nodes. This is an efficient approach because many applications consist of online insert and update transactions which tend to have short data access requirements. In addition to balancing the workload among CPUs, the parallel database provides for concurrent access to data and protects data integrity.Parallel database software must effectively deploy the system's processing power to handle diverse applications: online transaction processing (OLTP) applications, decision support system (DSS) applications, as well as a mixed OLTP and DSS workload. OLTP applications are characterized by short transactions which have low CPU and I/O usage. DSS applications are characterized by long transactions, with high CPU and I/O usage. 7. What are the examples of these database applications? y y Non-Uniform Memory Architecture (NUMA), which involves the Non-Uniform Memory Access. Cluster (shared nothing + shared disk: SAN/NAS), which is formed by a group of connected computers.
Real-time database 1. What is this database application? A real-time database is a processing system designed to handle workloads whose state is constantly changing (Buchmann). This differs from traditional databases containing persistent data, mostly unaffected by time. For example, a stock market changes very rapidly and is dynamic. The graphs of the different markets appear to be very unstable and yet a database has to keep track of current values for all of the markets of the New York Stock Exchange (Kanitkar). Real-time processing means that a transaction is processed fast enough for the result to come back and be acted on right away (Capron). Real-time databases are useful for accounting, banking, law, medical records, multi-media, process control, reservation systems, and scientific data analysis (Snodgrass). As computers increase in power and can store more data, they are integrating themselves into our society and are employed in many applications. 2. What is the architectural design of this database?
Although the real-time database system may seem like a simple system, problems arise during overload when two or more database transactions require access to the same portion of the database. A transaction is usually the result of an execution of a program that accesses or changes the contents of a database (Singhal). A transaction is different from a stream because a stream only allows read-only operations, and transactions can do both read and write operations. This means in a stream, multiple users can read from the same piece of data, but they cannot both modify it (Abbot). A database must let only one transaction operate at a time to preserve data consistency. For example, if two students demand to take the remaining spot for a section of a class and they hit submit at the same time, only one student should be able to register for it (Abbot). Real-time databases can process these requests utilizing scheduling algorithms for concurrency control, prioritizing both students requests in some way. Throughout this article, we assume that the system has a single processor, a disk based database, and a main memory pool(Haritsa). In real-time databases, deadlines are formed and different kinds of systems respond differently to data that does not meet its deadline. In a real-time system, each transaction uses a timestamp to schedule the transactions (Abbot). A priority mapper unit assigns a level of importance to each transaction upon its arrival in the database system that is dependent on how the system views times and other priorities. The timestamp method on relies on the arrival time in the system. Researchers indicate that for most studies, transactions are sporadic with unpredictable arrival times. For example, the system gives an
earlier request deadline to a higher priority and a later deadline to a lower priority (Haritsa). Below is a comparison of different scheduling algorithms. Earliest Deadline PT = DT product. The value of a transaction is not important. An example is a group of people calling to order a
Highest Value PT = 1/VT The deadline is not important. Some transactions should get to CPU based on criticalness, not fairness. This is an example of least slack that can wait the least amount of time. If the telephone switchboards were overloaded, people who call 911 should get priority (Snodgrass). Value inflated deadline PT = DT/VT Gives equal weight to deadline and values based on scheduling. An example is registering for classes where the student selects a block of classes that he wishes to take and presses submit. In this scenario, higher priorities often take up precedence. A school registration system probably uses this technique when the server receives two registration transactions. If one student had 22 credits and the other had 100 credits, the person with 100 credits would take priority (Value based scheduling). 3. How does it run? Database management systems provide tools for such organization, so in recent years there has been interest in "merging" database and real-time technology. The resulting integrated system, which provides database operations with real-time constraints is generally called a real-time database system (RTDBS) [1]. Like a conventional database system, a RTDBS functions as a repository of data, provides effcient storage, and performs retrieval and manipulation of information. However, as a part of a realtime system, whose "tasks" are associated with time constraints, a RTDBS, has the added burden of ensuring some degree of confidence in meeting the system's timing requirements. Example applications that handle large amounts of data and have stringent timing requirements include telephone switching (e.g. translating an 800 number into an actual numbradar tracking and others. Arbitrage trading, for example, involves trading commodities in different markets at different prices. Since price discrepancies are usually short-lived, automated searching and processing of large amounts of trading information are very desirable. In order to capitalize on the opportunities, buy-sell decisions have to be made promptly, often with a time constraint so that the financial overhead in performing the trade actions are well compensated by the benefit resulting from the trade. As another example, a radar surveillance system detects aircraft "images" or "radar signatures". 4. Who are the people involved in this database application? What are their functions? For example, a stock market changes very rapidly and is dynamic. The graphs of the different markets appear to be very unstable and yet a database has to keep track of current values for all of the markets of the New York Stock Exchange (Kanitkar). 5. Where it can be implemented?
Real-time databases are traditional databases that use an extension to give the additional power to yield reliable responses. They use timing constraints that represent a certain range of values for which the data are valid. This range is called temporal validity. A conventional database cannot work under these circumstances because the inconsistencies between the real world objects and the data that represents them are too severe for simple modifications. An effective system needs to be able to handle timesensitive queries, return only temporally valid data, and support priority scheduling. To enter the data in the records, often a sensor or an input device monitors the state of the physical system and updates the database with new information to reflect the physical system more accurately (Abbot). When designing a real-time database system, one should consider how to represent valid time, how facts are associated with real-time system. Also, consider how to represent attribute values in the database so that process transactions and data consistency have no violations. When designing a system, it is important to consider what the system should do when deadlines are not met. For example, an air-traffic control system constantly monitors hundreds of aircraft and makes decisions about incoming flight paths and determines the order in which aircraft should land based on data such as fuel, altitude, and speed. If any of this information is late, the result could be devastating. To address issues of obsolete data, the timestamp can support transactions by providing clear time references. 6. Why it is needed? Real-time databases are traditional databases that use an extension to give the additional power to yield reliable responses. They use timing constraints that represent a certain range of values for which the data are valid. This range is called temporal validity. A conventional database cannot work under these circumstances because the inconsistencies between the real world objects and the data that represents them are too severe for simple modifications. An additional way of dealing with conflict resolution in a real-time database system besides deadlines is a wait policy method. This process helps ensure the latest information in time critical systems. The policy avoids conflict by asking all non-requesting blocks to wait until the most essential block of data is processed (Abbot). While studies in labs have found that data-deadline based policies do not improve performance significantly, the forced wait policy can improve performance by 50 percent (Porkka). The forced wait policy may involve waiting for higher priority transactions to process in order to prevent deadlock. Another example of when data can be delayed is when a block of data is about to expire. The forced wait policy delays processing until the data is updated using new input data. The latter method helps increase the accuracy of the system and can cut down on the number of necessary processes that are aborted (Kang). Generally relying on wait policies is a not optimal (Kang). 7. What are the examples of these database applications? y y ADDM Real-Time Database Systems 2001
Spatial database 1. What is this database application?
A spatial database is a database that is optimized to store and query data that is related to objects in space, including points, lines and polygons. While typical databases can understand various numeric and character types of data, additional functionality needs to be added for databases to process spatial data types. These are typically called geometry or feature. The Open Geospatial Consortium created the Simple Features specification and sets standards for adding spatial functionality to database systems. 2. What is the architectural design of this database? Database systems use indexes to quickly look up values and the way that most databases index data is not optimal for spatial queries. Instead, spatial databases use a spatial index to speed up database operations.
In addition to typical SQL queries such as SELECT statements, spatial databases can perform a wide variety of spatial operations. The following query types and many more are supported by the Open Geospatial Consortium: y y y y Spatial Measurements: Finds the distance between points, polygon area, etc. Spatial Functions: Modify existing features to create new ones, for example by providing a buffer around them, intersecting features, etc. Spatial Predicates: Allows true/false queries such as 'is there a residence located within a mile of the area we are planning to build the landfill?' (see DE-9IM) Constructor Functions: Creates new features with an SQL query specifying the vertices (points of nodes) which can make up lines. If the first and last vertex of a line are identical the feature can also be of the type polygon (a closed line). Observer Functions: Queries which return specific information about a feature such as the location of the center of a circle
Not all spatial databases support these query types. 3. How does it run?
A data warehouse system getting some data from different GDLs and integrating them in a single database. Users, instead of consulting several different GDL could do a one-stop initial search in this data warehouse and find the preliminary information they need. From there, using a computerized procedure to define their needs (the system could perform a preliminary but very useful filtering of the data sets. If more precise information is needed about the successful data sets, users could go to the legacy GDL's data source to find what they are looking for. Using such architecture greatly reduces the problems previously mentioned (and completely eliminates them if the metadata stored/derived in the warehouse are sufficient). Using our scene architecture, small queries as well as large queries are processed efficiently. Small queries are processed by single page accesses as described before. If a range query specifies a larger query region, all scenes intersecting the query region, i.e. subtrees of the R*-tree, are transferred into the main memory. For each scene just onesearch operation on secondary storage is necessary. 4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? We proposed a storage and access architecture for geographic database systems. This Architecture integrates a number of various concepts and techniques for efficient query Processing. y y y y y -All OpenGIS Specifications compliant products -Open source spatial databases and APIs, some of which are OpenGIS compliant -Boeing's Spatial Query Server (Official Site) spatially enables Sybase ASE. -Smallworld VMDS, the native GE Smallworld GIS database -SpatiaLite extends Sqlite with spatial datatypes, functions, and utilities.
y y y y y
y y
-IBM DB2 Spatial Extender can be used to enable any edition of DB2, including the free DB2 Express-C, with support for spatial types -Oracle Spatial -Microsoft SQL Server has support for spatial types since version 2008 -PostgreSQL DBMS (database management system) uses the spatial extension PostGIS to implement the standardized datatype geometry and corresponding functions. -MySQL DBMS implements the datatype geometry plus some spatial functions that haven't been implemented according to the OpenGIS specifications. Functions that test spatial relationships are limited to working with minimum bounding rectangles rather than the actual geometries. MySQL versions earlier than 5.0.16 only supported spatial data in MyISAM tables. As of MySQL 5.0.16, InnoDB, NDB, BDB, and ARCHIVE also support spatial features. -Neo4j - Graph database that can build 1D and 2D indexes as Btree, Quadtree and Hilbert curve directly in the graph -AllegroGraph - a Graph database provides a novel mechanism for efficient storage and retrieval of two-dimensional geospatial coordinates for Resource Description Framework data. It includes extension syntax for SPARQL queries -MongoDB supports geospatial indexes in 2D
6. Why it is needed? Creates flexible and scalable solutions that integrate easily into any organization's IT environment by adopting security policies, following OGC, ISO, and INSPIRE standards, adapting to corporate standards, and adjusting to multiple types of user-profile demands. Organizations, enterprises, and governments manage and display spatial data in an intuitive, easy-to-understand format that facilitates collaboration and data interoperability. It happens when all the pieces of spatial data infrastructure are in place. 7. What are the examples of these database applications? y y y y y y y y y y y y y y y y y -Census Data -NASA satellites imagery - terabytes of data per day -Weather and Climate Data -Rivers, Farms, ecological impact -Medical Imaging -Road Map Grid (spatial index) Z-order (curve) Quadtree Octree UB-tree R-tree: R+ tree R* tree Hilbert R-tree X-tree kd-tree
Temporal database
1. What is this database application? Temporal database is a database with builtin time aspects. A time period attached to the data expresses when it was valid or stored in the database. By attaching a time period to the data, it becomes possible to store different database states. 2. What is the architectural design of this database? A bi-temporal relation contains both valid and transaction time. This is good because it provides both temporal rollback and historical information. Temporal rollback (e.g.: "In 1992, where did the database believe John lived?") is provided by the transaction time. Historical information (e.g.: "Where did John live in 1992?") can be derived from valid time. The answers to these example questions may not be identical - the database may have been altered since 1992, causing the queries to produce different results. 3. How does it run? More specifically the temporal aspects usually include valid-time and transaction-time. These attributes go together to form bitemporal data.

Valid time denotes the time period during which a fact is true with respect to the real world. Transaction time is the time period during which a fact is stored in the database. Bitemporal data combines both Valid and Transaction Time.
Note that these two time periods do not have to be the same for a single fact. Imagine that we come up with a temporal database storing data about the 18th century. The valid time of these facts is somewhere between 1701 and 1800, whereas the transaction time starts when we insert the facts into the database, for example, January 21, 1998. It is possible to have timelines other than Valid Time and Transaction Time, such as Decision Time, in the database. In that case the database would be called a multitemporal database as opposed to a bitemporal database. However, this approach introduces additional complexities such as dealing with the validity of (foreign) keys. Bi-temporal relation contains both valid and transaction time. This is good because it provides both temporal rollback and historical information. The Valid Time stress the data for which a fact is true in the real world. Transaction time is the time a transaction was made. This enables queries that show the state of the database at a given time. Two more fields are added to the Person table: Transaction-From and Transaction-To. Transaction-From is the time a transaction was made, and Transaction-To is the time that the transaction was superseded (or infinity if it has not yet been superseded). In order to achieve perfect archival quality it is of key importance to store the data under the schema version under which they firstly appeared. However even the most simple temporal query rewriting the history of an
attribute value would be required to be manually rewritten under each of the schema versions. This process would be particularly taxing for users. A common solution is to provide automatic query rewriting. 4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? The following implementations implement a bitemporal database in a relational database management system (RDBMS): y -Oracle Workspace Manager- a feature of Oracle Database, enables application developers and DBAs to manage current, proposed and historical versions of data in the same database. The latest version complies with TSQL2. -TimeDB is a free temporal relational DBMS by TimeConsult. It runs as a frontend to Oracle that accepts TSQL2 statements and generates SQL92 statements. -PostgreSQL has an open-source contributed package that can be installed in the database to manage temporal data. Teradata version 13.10 has temporal features built into the database. -Teradata version13.10 has temporal features built into the database.
y y y
6. Why it is needed? y y y y y -It is use in dealing with variation of data over time. -Identification of an appropriate data type for time. -Prevent fragmentation of an object description. -Provide query algebra to deal with temporal data. -Compatible with old database without temporal data.
7. What are the examples of these database applications?
y y y
-A temporal data model -A temporal version of Structured Query Language -Short biography
Unstructured-data database 1. What is this database application? An unstructured-data database is intended to store in a manageable and protected way diverse objects that do not fit naturally and conveniently in common databases. It may include email messages, documents, journals, multimedia objects etc. The name may be misleading since some objects can be highly structured. However, the entire possible object collection does not fit into a predefined structured framework. Most established DBMSs now support unstructured data in various ways, and new dedicated DBMSs are emerging. 2. What is the architectural design of this database? Unstructured Data (or unstructured information) refers to information that either does not have a pre-defined data model and/or does not fit well into relational tables. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional computer programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents. The term is imprecise for several reasons; y structure, while not formally defined can still be implied and y data with some form of structure may still be characterized as unstructured if its structure is not helpful for the desired processing task, and y unstructured information might have some structure (semi-structured) or even be highly structured but in ways that are unanticipated or unannounced. 3. How does it run?
Unstructured information represents the vast majority of the data collected and accessible to enterprises. Creating order from an abundance of sources in many formats, unstructured information management (UIM) make this data accessible and searchable via management systems and applications. Unstructured information management (UIM) applications are software systems that analyze unstructured information (text, audio, video, images, etc.) to discover, organize, and deliver relevant knowledge to the user. In analyzing unstructured information, UIM applications make use of a variety of analysis technologies, including statistical and rule-based Natural Language Processing (NLP), Information Retrieval (IR), machine learning, and ontology, making it understandable to the users. 4. Who are the people involved in this database application? What are their functions? 1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and some of the popular DBMSs have been under development and enhancement (also to follow progress in technology) for decades. 2. Application developers and Database administrators - These are the people that design and build a database-based application that uses the DBMS. The latter group members design the needed database and maintain it. The first group members write the needed application programs which the application comprises. Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS inside (see Embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS. 3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it. 5. Where it can be implemented? Data mining and text analytics and noisy text analytics techniques are different methods used to find patterns in, or otherwise interpret, this information. Common techniques for structuring text usually involve manual tagging with metadata or Part-of-speech tagging for further text mining-based structuring. UIMA (Unstructured Information Management Architecture.) provides a common framework for processing this information to extract meaning and create structured data about the information. 6. Why it is needed? Creates machine-processable structure exploits the linguistic, auditory, and visual structure that is inherent in all forms of human communication. This inherent structure can be inferred from text, for instance, by examining word morphology, sentence syntax, and other small- and large-scale patterns. Unstructured information can then be enriched and tagged to address ambiguities and relevancy-based techniques then used to facilitate search and discovery. Because of its seemingly infinite scalability, Extraordinary fault tolerance, High availability, A design-friendly lack of schema, Integration of both Restful and cloud computing technologies.
7. What are the examples of these database applications? y y y y y y y y y Books Journals Documents Metadata health records audio video files unstructured text (such as the body of an e-mail message, Web page, or word processor document)
University of Perpetual Help System Laguna Sto. Nio, City of Bian, Laguna
Project in Database Management
Submitted By: Garde II, Henry D. Jocson, Karl Neil Anthony C. Espeleta, Russelle Ralph Ubando, Verlourd S.
Submitted To: Mrs. Eliza Mapanoo
Submitted On: December 12, 2011

Database

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database

Uploaded by

Copyright:

Available Formats

Active database

once committed, the data are static,

7. What are the examples of this database application? y y AdventureWorksDW2008R2 ETL

askSam Systems Proprietary Apstrata Significant Data Systems Proprietary Proprietary

Free community C++ license /Commercial

CRX MUMPSDataba se UniVerse UniData Jackrabbit

Proprietary Proprietarya nd GNU MUMPS Affero GPL

A JSON-based schemafree database optimized for agile development.

GNU AGPL v3.0

Optional using external tools[7]

Java, .NET, C++

commercial or GNU AGPL .NET v3.0

MIT License Apache License Java

Federated database and multi-database 1. What is this database application?

2. What is the architectural design of this database?

Spatial database 1. What is this database application?

7. What are the examples of these database applications?

Project in Database Management

Submitted To: Mrs. Eliza Mapanoo

Submitted On: December 12, 2011

You might also like