You are on page 1of 40

MANAGEMENT INFORMATION SYSTEMS

DATA FILES AND DATABASES

Compiled by Dr. Jennifer W, PhD, PMP® for Online Training


Data Files and Databases

At the end of this lecture, you should be able to:


• Explain different types of databases in information
technology.
• Discuss various methods used to manage data files.
• Describe components, advantages and disadvantages of
database
FILE ORGANIZATION
(Terms and Concept)
• BIT: Smallest Unit of data-Binary Digit (0,1)
• BYTE: Groups of BITS which represent a CHARACTER
• FIELD: Groups of words or number
• RECORD: Group of related FIELDS
• FILE: A Collection of similar RECORDS
• DATABASE: A repository of logically related and similar data that can easily be accessed, managed and updated
E.g. student database, YouTube
• ENTITY: Person, place, thing, event about which data must be kept
• ATTRIBUTE: Description of a particular ENTITY
• DATA : A collection of raw facts and figures
• INFORMATION: Systematic and meaningful form of data
• KEY FIELD: Field in Each Record that Uniquely Identifies the Record for Easier RETRIEVAL, UPDATING, SORTING
• BATCH(data is collected over several days or weeks before being processed)
• REAL-TIME PROCESSING (Right away/direct processing)
• Master (complete file containing all records current up to the last update) and Transaction Files (File contains
recent changes to records that will be used to update the master file)
Data Hierarchy in a Computer System
More Terms
Key field
• Every record in a file should contain at least one field that uniquely identifies that
record so that the record can be retrieved, updated, or sorted. This identifier field is
called a key field.

Entity
• A person, place, thing, or event on which we maintain information.
Accessing Records from Computer Files
Computer stores files on secondary storage devices.
• Records can be arranged in several ways on storage media.
• How individual record scan be accessed or retrieved depends on how they are
arranged on storage media.
There are mainly two ways to organize records: sequentially or randomly.
• In sequential file organization, data records must be retrieved in the same physical
sequence in which they are stored.
• In direct or random file organization, data records can be accessed in any sequence as
users desire, without regard to actual physical order on the storage media.
• Sequential file organization is the only file organization that can be used on magnetic
tape. Example: Payroll
• Direct or random file organization is utilized with magnetic disk.
*Most computer applications utilize this method*
TRADITIONAL FILE ENVIRONMENT
(FLAT FILE SYSTEM)

• A manual file system for maintain the records and files


• Each file is independent of other file
• Integration can be done only by writing individual program
for each application
• Any change to the data requires modifying all the
programs that uses the data
• Because each file is hard-coded with specific information
like data type, data size etc
• Can only Identify program using the data on a trial-and-
error basis
TRADITIONAL FILE ENVIRONMENT
Conti…

• All functional areas in an organization creates, processes and


disseminates its own files.
• Inventory and payroll generate separate files and do not
communicate with each other.
• Data is dispersed throughout the functional sub-systems
Advantages
• Simple to operate
• Better local control
• Minimal investment
• No requirement of the specialist.
Disadvantages of Traditional File System
• Data Redundancy: Each application has its own data file so, same data may have
to be recorded and stored in many times. Thus lots of data duplication.
• Data Inconsistency: Due to the same data items that appear in more than one file
do not get updated simultaneously in each and every file
• Lack of Data Integration: Hard to run a query that access information from
different files, coz of data dependence.
• Program Dependence: Change in data or structure require modification of
Program as well
• Data Dependence: Programs in file processing system are data dependent thus
the file organization, its physical location and retrieval from the storage media are
dictated by the requirements of the particular application.
Disadvantages of Traditional File System
Cont..
• Limited Data Sharing: Each application has its own private files and users have little options to
share unless complex programs are written to allow sharing.

• Poor Data Control: No centralized control at the data element level, hence a traditional file
system is decentralized in nature.

• Problem of Security: Difficult to enforce security checks and access rights in a traditional file
system, since application programs are added in an adhoc manner.

• Data Manipulation Capability is Inadequate: Data manipulation capability is very limited in


traditional file systems since they do not provide strong relationships between data in different
files.

• Needs Excessive Programming: Change to an existing file structure forces modifications in all
of the programs that use the data in that file which takes time. Change means developing the
program from scratch.
Example of a Flat file
Traditional File Processing
Database Management System (DBMS)
• Database – A shared collection of logically related data, and a description of the data, designed to
meet the information needs of an organization.
• Database Management System (DBMS) – A software system that enables users to define, create,
maintain and control access to the database. Also permits centralization of data and data
management
• Application of DBMS: Insurance, Hospitals, Airlines, Universities, schools, Banking, Human
Resources, Manufacturing, and selling etc.
• Roles in the Database: Data and Database Administrators, Database Designers, Application
Developers, Users
• Queries: Users can request data from specified fields.
• Security: Addresses Security issues by giving users different views depending on the role.
• DBMS examples: SQL Server, FileMaker, Oracle, RDBMS, dBASE, Clipper, MySQL, PostgreSQL,
Microsoft Access and FoxPro, SQLite, MySQL, LibreOffice Base, IBM DB2, MariaDB
Different User views in one Database
Database Management System (DBMS)
The Contemporary Database Environment
Data Concepts and Characteristics
Data Hierarchy
– Fields/columns
• Hold single pieces of data
– Records/rows
• Groups of related fields
– Tables
• Collection of related records
– Database
• Contains a group of related tables
Database Architecture
Distributed Databases
– Replication
• Full copy of the entire database is stored at all sites.
– Fragmentation
• Parts of database are stored where they are most often accessed. Reducing access traffic
Centralized Database (Client/Server Systems)-
– Four basic client/server models
• Applications run at a server
• Applications run on local PCs
• Applications run on both the local PCs and the server
• Applications and key elements of the database are split between the PCs and the server
Types of Database Management systems
• Operational Databases
• Distributed Databases
Operational Databases
Information(data) is stored at a centralized location and the users
from different locations can access this data. It contains procedures
to access information from Remote location by verifying and
validating end users(E.g. using student number) which also keeps
track and record of usage.
Distributed Databases
• Data is located in different sites of the organization. These sites are
connected to each other with the help of communication links
which helps them to access the distributed data easily.
• There are two kinds of distributed database: homogenous and
heterogeneous
• Homogenous Database: Runs on same hardware, same software
and same application procedure in all sites of Database
• Heterogeneous Database: Operating system, underlying hardware
and application procedures are different at various sites of
Database
Distributed Databases
Advantages of Database Systems
▪ Controlled redundancy: Duplication can be carefully controlled, that means the database
system is aware of the redundancy and it assumes the responsibility for propagating updates.
▪ Data consistency: Controlled redundancy solves the problem of Data consistency.
▪ Program data independence: database systems provide an independence between the file
system and application program, that allows for changes at one level of the data without
affecting others.
▪ Sharing of data: Data is centrally controlled and can be shared by all authorized users
▪ Enforcement of standards: Sstandardized data formats can be enforced to facilitate data
transfer between systems.
• Improved data integrity centralized control property ensure that data is both accurate and
consistent.
• Improved security: DBA ensures that proper access procedures are followed, including proper
authentical schemes for access to the DBMS and additional checks before permitting access to
sensitive data.
▪ Data access is efficient: Sophisticated techniques are used in accessing stored data.
Advantages of Database Systems
Cont..
• Conflicting requirements can be balanced: DBA resolves the conflicting requirements of
various users and applications
• Improved backup and recovery facility: has backup and recovery subsystem incase of
hardware or software failure.
• Minimal program maintenance: Due to independence of data and application programs
compared to tradition file system
• Data quality is high: Has many tools and processes available.
• Good data accessibility and responsiveness: users to ask ad hoc queries to obtain the
needed information immediately
• Concurrency control: They can manage simultaneous (concurrent) access of the database
by many users.
• Economical to scale: Data is stored in a central Database reduces overall costs of
operation and management of the database
• Increased programmer productivity: Database systems has many functions that’s allow
programmer to choose what users require, thus reducing development time and cost.
Disadvantages of Database Systems
• Complexity increases: Since its supporting many application. Need a specialized person to
manage.
• Requirement of more disk space: Need more space to store and run than the traditional file
system
• Additional cost of hardware: Depending on the environment cost of hardware and
maintenance is more
• Cost of conversion: From old file-system to new database system is very high, may need
hardware, training, or hiring specialized person.
• Need of additional and specialized manpower: Need to hire and train its manpower on regular
basis to design and
implement databases and to provide database administration services.
• Need for backup and recovery: Need a procedure to backup
• Organizational conflict: Need a consensus on data definitions and ownership as well as
responsibilities for accurate data maintenance.
• More installation and management cost: big and complete database systems are more costly.
Need more manpower and maintenance cost.
Databases on the Web

• Database application designed to be managed and


accessed through the Internet.
• Website operators can manage this collection of data and
present analytical results based on the data in the Web
database application.
• The database might be used for any of a wide range of
functions: Email, sports, catalogues, online shopping etc
Databases on the Web
Cont.…
Points to consider:
• Which application to use? Every Application has its own needs
• No database can meet the needs of every application thus know database types to make the
right choice.
• The “Four Ss” of database characteristics are structure, size, speed, and scalability.
– Structured: structured, semi-structured, or unstructured. The more structured the data,
the more readily it can be accessed and analyzed.
– Size: the quantity of data to be stored and how its retrieved.
– Speed: optimized for read-heavy apps or designed more for write-heavy ones.
– Scalability: Can databases grow with your business, does it calls for a horizontal (adding
more servers) or vertical (adding more resources to existing servers) scaling.
• Security: is the database secure, can webservers interfere with the database and its updates?
Components of Database system
• Users- People who interact with the database:
– Application Programmers/developers.
– End Users.
– Data Administrators.
• Software- Lies between the stored data and the users:
– DBMS.
– Application Software.
– User Interface.
• Hardware- Physical device on which database resides. e.g.: Computers, Disk Drives, Printers,
Cables etc.
• Data- numbers, characters, pictures.
• Procedures- Instructions and rules that assist on how to use the DBMS
Data Warehousing
• It is a technology that aggregates structured data from one or more sources so that it can
be compared and analyzed rather than transaction processing.
• Designed to support management decision-making process by providing a platform for
data cleaning, data integration and data consolidation.
• A data warehouse is subject-oriented, integrated, time-variant and non-volatile data.
• Consolidates data from many sources while ensuring data quality, consistency and
accuracy.
• Improves system performance by separating analytics processing from transnational
databases using Query tools to get reports.
• A Process of transforming data into information and making it available to users for
analysis.
• Store current and historical data unlike transaction database which is not suitable for
analysis and only contain current information
• Data Mart: Subset of the data in the warehouse, designed for use by a specific subject or
department, unit or set of users in an organization. E.g. Sales, Accounts, Marketing
Data Warehousing
Components of a Data Warehouse
Data Mining
What is Data Mining?
• Collection of techniques used to find undiscovered patterns by manipulating large
volumes of data
• Used in conjunction of data warehousing
• It’s a process of discovering of new information to help in decision making
• Uses sophisticated data analysis tools to discover previously unknown, valid patterns
and actionable information from vast amount of data that can make crucial business
decisions.
• The tools allows end users to directly access, and manipulate the data within data
warehousing environment
• Data Mining Process as a Part of Knowledge Discovery Process
Data Mining | KDD process
Data Mining – Knowledge Discovery in Databases(KDD)
Goals of Data Mining
• Prediction: Can predict behavior of certain data attributes in the future.
• Identification: Can identify the existence of an item, an event, or an activity on the bases
of analysis made on different data patterns.
• Classification: Can Classify the data into different categories on the basis of certain
parameters E.g. grade human resource data based on performance
• Optimization: Can optimize the use of limited resources like time, cost, space, manpower
and machine power in such
a way that it will make a boom in output such as profits, increase in sales, cutting in
expenditure etc.
Elements of Data Mining
Data mining consists of five major elements:
• Extract, transform, and load transaction data on to the data warehouse
system.
• Store and manage the data in a multi-dimensional database system.
• Provide data access to business analysts and information technology.
• Analyze the data by application software.
• Present the data in a useful format, such as graphs or tables.
Data Mining Applications
• It can be used for different estimation problems like effort, duration,
quality or maintenance cost.
• Used in Project Management, Marketing, Finance, Insurance,
Manufacturing, Health care etc
• Example: Marketing:
– Identify buying patterns of customers.
– Market basket analysis.
– Predict response of mailing campaigns.
– Find association among customer demographic characteristics.
– Design of catalogs, store layouts and advertising campaigns.
Advantages of Data Mining
• Helps Companies get knowledge-based information
• Organizations can make profitable adjustments in operation and productions.
• It helps companies in decision-making process.
• Provide marketers with accurate trends and behaviors about their customers purchasing
behavior
• Help helps researches by speeding up their data analyzing process using huge data, thus
more work with less time.
• Law enforcers can identify criminal suspects as well as capturing them by examining
trends in location, crime type etc.
• Can assist financial institutions in areas such as credit reporting and loan information
Disadvantages of Data Mining
• Privacy issues : Personal privacy has been a concern with the increased
use of Technology. A rogue company can sell their customer
information.
• Security issues : Different users access the database directly and this
can cause leakage of secured data.
• Inaccurate information : Data Mining may not be 100% accurate this
raising the issue of inaccurate and inconsistent information.
• Software: Difficult to operate require adequate training or experts
• Tools selection: Different data mining tools exist and they have
different algorithm. Selecting the right tool can be a challenging task
Summary

• Traditional file organization vs. database approach


• How database management systems effect database
development
• Features and operation of a relational database
• Data warehousing and data mining
Exercises
• Define the terms: data, database, database management systems
• Describe the structure of data on a typical card
• Discuss advantages and disadvantages of traditional files and databases
and compare them
• Describe the database models
• Describe the components of a database design
• Describe the relational databases and operators
• Describe distributed databases and client/server systems
• Discuss databases on the Web
• Describe a data warehouse and how it is built
• Discuss data mining and On-line Analytical Processing

You might also like