You are on page 1of 30

Data vs. Information Data: Raw facts.

The word raw indicates that the facts have not been processed to reveal their meaning. Information: Information is produced by processing raw data. Information is then used for decision making. Good decision making is the key to successful business, therefore the information produced by data processing should be accurate, relevant and timely. Knowledge: The body of information and facts about a specific subject is called knowledge.

Data management: Data management is a discipline that focuses on proper generation of data, storage and retrieval of data. Database A database is a shared, integrated computer structure that stores a collection of:

End user data Metadata

Metadata is data about the data stored in the database. E.g. the name of the field, size, data type, compulsory field or not etc

Database Management System Dbms is a collection of programs that manages the database structure and controls the access to data stored in the database. Dbms serves as an intermediary between the user and the database. The dbms receives the data requests from the user/application program and retrieves the data from the database and sends it back to the user. Advantages of the dbms Improved data Sharing

Several users can have different views of the same data. Thus data is shared by all applications

Better data integration Data is stored at a centralized location. Data of the entire organization is stored in the repository.

Minimized data inconsistency Multiple versions/copies of the same data are not stored at different locations hence there is no redundancy of data. And so there is no inconsistency.

Improved data access Data stored in the database can be accessed easily using queries. Query is a specific request for data storage, retrieval or modification issued by the user to the dbms.

Improved decision making

Better managed data and improved data access make it possible to generate better quality information, on which better decisions are made.

Increased end user productivity Quick and good decisions increase the productivity of the employees and in turn the success of the organization.

Types of databases Based on the number of users Single-user database: supports only one user at a time

Desktop database: A single user database that runs on a personal computer Multi-user database: supports multiple users at a time Workgroup database: multi-user database that supports less then 50 users or a specific department in an organization Enterprise database: when the database is used by the entire organization and supports many users, across departments Based on location Centralized database: a database that supports data located at a single location Distributed database: a database that supports data located at multiple locations

Based on type of use Operational database: a database that is designed to support a companys day-to-day operations is called operational database. It is also known as transactional or production database. Data warehouse: It stores data which can be used to generate information that is used for tactical and strategic decision making.

Files and File Systems Earlier organizations used to keep track of the necessary data using a manual file system.

The manual file system consisted of files which were kept in the filing cabinet. The content of each file was logically related. E.g. one file would store data related to the employee another file would store data related to the customer, products and so on As the volume of the data increased and the reporting requirements of the organization increased the manual file system was difficult to use. It was time consuming and complicated.

There fore the need of computer based system that would track data and produce reports was created. Data processing Specialist: created the computer file structures to store data, created the software to manage the file structures, and designed application programs that produced reports based on the file data. File Terminology Data: Raw facts Character: The smallest piece of data that is recognized by the computer. It requires one byte of storage.

Field: A character or group of characters that has a specific meaning. Record: A logically related set of one or more fields that describes a person place thing or an event. File: A collection of related records E.g. a file might contain data about customers or students etc Computerized file system saved a lot of time and effort for the users. It was quick and easy to search for required information

Reports could be generated easily and hence decision making became better. So each department started making their own computerized file system and wrote the application programs to generate reports.

limitations with File System Data Management It requires extensive programming Simple data retrieval tasks require a lot of programming effort using 3GL languages. E.g. COBOL, BASIC, C++ and FORTRAN

It requires the programmer to code what must be done and how it must be done. There are no ad hoc query capabilities There ad hoc queries become impossible with 3 GL languages. It takes very long to generate such reports a week or a month etc System Administration can be complex System administration becomes difficult as the number of files in the system goes on increasing Difficult to change existing structures Every file requires its own file management system to add modify and delete records and to view the contents and generate reports.

Security features are inadequate Making changes in the file structure requires changes to be made in the programs that use data from that file Security features are limited. Safeguards designed to protect confidential data are difficult to program.

Problems with File System

The limitations of the file system lead to the following problems: Structural and Data dependence A file system has structural dependence i.e. access to the file is dependent on its structure. i.e. if the structure changes the programs which access the data from the file will also have to be modified. Structural Independence exists when it is possible to make changes in the file structure without affecting the application programs.

Data Dependence Even changes in the file data characteristics affects the data access programs hence the file system is said to exhibit data dependence. E.g. If a field in the file changes from integer to decimal data type then the program accessing the data will also require modifications. Data independence exists when it is possible to make changes in the data storage characteristics without affecting the application programs ability to access data.

Field Definitions and Naming Conventions The record and field designs in the file should take into consideration the reporting requirements. E.g. Fname, Lname,City, Area code etc Field names should be descriptive and meaningful Length of the field names should be not too long and not too short.

It should convey the name of the file and give an idea about the kind of data that will be stored in that field. E.g. Instead of using REN you can use CUST_RENEW_DATE There should be a field which uniquely identifies the records in the file. E.g. CUST_ID Data Redundancy The data is stored at multiple locations across the organization. When there is any modification the data stored at the different locations may not get updated consistently, thus creating different versions of the same data. This is known as data redundancy.

Uncontrolled data redundancy creates the following problems: Data Inconsistency Data inconsistency exists when different and conflicting versions of the same data appear in different places. Reports generated from such data will also be incorrect. The data entry errors are more likely when the same data has to be entered at many locations. Spelling mistakes, wrong data etc Data Anomalies

Anomaly means an abnormality/mistake. When data redundancy exists there can be data anomalies because any change in data at one place requires corresponding modification at all the locations or in all the files. A data anomaly develops when all the required changes in the redundant data are not made successfully. Update Anomalies: If the agents phone no changes it must be changed for every customer record who the agent serves. Insertion anomalies: If only customer data file exists then to add data about a new a dummy customer entry must be created. Deletion anomalies: If you delete a customer record which is the only record that contains

a particular agent data, then that agents data will be lost.

Database Systems Database consists of logically related data stored in a single logical data repository. Physical data format: the format in which the computer stores the data physically on the storage device. Logical data format: the format in which the users view the data. The users may view the same data in a variety of different ways.

There are multiple logical views of the data. Database System Environment Database system consists of hardware, software, people, procedures and data. Hardware Hardware is the physical devices of the system. E.g. computer, printer, routers, storage devices etc Software Operating system manages all the hardware and software to work together and run the computer.

Dbms software manages the database within the database system. Application programs and utilities help to access the data from the database and generate reports and perform other tasks. People This component includes all the users of the database system. System Administrators: see overall functioning of the system Database administrators: also known as DBA-ensure that the database if functioning properly Database designers: database architects, they design the database structure.

System analysts and programmers: they design and implement the application programs. They create the data entry screens, reports and procedures through which users can access and manipulate databases data. End-users: users of the system who use the application programs. Procedures Procedures are the instructions and rules which govern the design and use of the database system. It specifies the business rules and standards for input and output data. Data A collection of facts stored in the database. DBMS Functions

A DBMS performs several functions to maintain the integrity and consistency of the data in the database. All the functions of the dbms are transparent to the users i.e. the end-users do know come to know how the dbms carries out such functions. The following functions are performed by dbms Data dictionary management The dbms stores definitions of data elements and their relationships (metadata) in the data dictionary.

Whenever a program wants to access the data it sends the request to the dbms, which then looks up the required data component structures and relationships and relieves the user the task of programming. Whenever any changes are made in the database structure, the changes are updated in the data dictionary also. Thus the dbms removes data and structural dependency from the system. Data storage management The dbms creates and manages the complex structures required for data storage thus relieving you of the task of programming the complex physical data characteristics.

Dbms stores data, data validation rules, procedural code, images and picture formats also. Storage management also does performance tuning: to make data storage and access more and more efficient and fast. Dbms stores data in multiple physical files and can serve multiple users at a time. Data transformation and presentation Dbms does the conversion from physical data format to logical data format and vice versa during data retrieval and storage respectively. Security management

Dbms creates a security system that enforces user security and data privacy. Security rules: Who can access data? What can be accessed by whom? What operations can be performed? Multi-user access control Dbms allows multiple users can access the database concurrently. Dbms uses complex algorithms to provide multiple users access to the database without affecting the data integrity and consistency. Backup and recovery management

Dbms provides utilities which allow the DBAs to perform back up and restore activities on the database. Recovery management deals with recovery of data in case of failure of database. Back up and recovery is needed to preserve the database integrity.

Data integrity management Dbms enforces data integrity rules, to minimize data redundancy.

The data relationships stored in the data dictionary are used to enforce data integrity. Data access languages and application programming interfaces Dbms provides data access through a query language. Query language is a non procedural language one that lets the user specify what must be done without specifying how it is to be done. E.g. SQL Dbms provides administrative tools to the dbms and also allows programmers to access data via procedural languages. Database communication interfaces

Users can communicate with the databases in different ways: Through the internet using browsers or publishing reports on the website or through data entry forms etc