Professional Documents
Culture Documents
CSE494/598 Principles of Information Engineering
CSE494/598 Principles of Information Engineering
Lesson Objectives:
1. Describe the parts of the Information Life Cycle.
2.
3. 4. 5.
6.
Transport
Discard
1. Information Acquisition
Acquiring of business-related information in digital form Traditionally, record based data mostly in table form Now multimedia data
Conversion to digital form for on-line processing Overall organization for seamless integration
Technique must be fast, one-pass, adaptive and invertible, and must not impose unreasonable requirements on resources.
4. Storage
Business data can be very large and heterogeneous with respect to all parameters Appropriate storage techniques ensure: proper management, location and distribution, and the flow of objects. Among issues to be considered:
Data placement What technology (medium) to use for storage Distribution: local, remote, out-sourced Speed of delivery
5. Re-engineering
Legacy systems make up most of the business data systems Maintenance and modernization of these systems represents a large portion of IT efforts Important decisions:
Maintenance replace & migrate modernize for co-existence
Legacy code...
Theory: rebuild the legacy system from ground up with
a relational (or OO) database graphical user interfaces client/server architecture
Practice: expensive and risky, because of size, complexity and poor documentation.
Case study 1
700 clients 120,000,000 credit cards (mid-90s figure) Over 14 tera bytes of data 2 billion transactions per month
19 billion disk/tape I/O per month
Case study 2
22 million telephone customers zero downtime must be guaranteed COBOL code: Hundreds of millions of lines Many tera bytes of data owned by applications
no sharing -> redundant storage Regulatory change: rate of return to price cap
Case study 2
Incremental migration into a client server computing architecture Began in late 80s ago, still on-going Around 10,000 workstations, and growing Biggest challenge: Inability of mainframe to participate in distributed C/S computing
CICS unable to cooperate in a nested subtransaction Integrity?
Migration Strategies
Complete rewrite of legacy code
Many problems Risky Prone to failure
Incremental migration
Migrate the legacy system in place by small incremental steps Control risk by choosing increment size.
Incremental Migration
Incrementally analyze the legacy IS Incrementally decompose Incrementally design the target interfaces Incrementally design the target applications Incrementally design the target database Incrementally install the target environment Create and install the necessary gateways Incrementally migrate the legacy database Incrementally migrate the legacy applications Incrementally migrate the legacy interfaces Incrementally cut over to the target IS
A Comparison
One step
Suited for Risk Failure Benefits Outlook Non-decomposable Huge Entire project Immediate Unpredictable until deadline
Incremental
Decomposable Controllable Step at a time Incremental Conservatively optimistic
6. Preservation
Similar to physical security measures for protecting buildings, cash and other tangible assets, information must be protected while recorded, processed, stored, shared, transmitted, or retrieved. Must protect against loss, alteration, and disclosure Must prevent unauthorized access and unauthorized use of
Computer systems Networks Information
7. Retrieval
Query languages have come a long way from old style navigational queries to todays content-based query languages Important: Any constraint (e.g., a processable feature) may be used as the criterion for search Require efficient retrieval techniques, similar to those for data retrieval, for all types of information
Meta-Search Engines
Has a number of modules. The user interface module accepts the users query which will be forwarded, with necessary reformatting, by the query dispatcher module to the various search engines. When the search engines return the sets of the retrieved documents to the metasearch engine, these sets are merged by the result merger module into a single ranked list of documents.
8. Presentation
Information must be presented to the user in a form that is usable
Cookies take care of part of the issue
Issues are diverse and range from formatting, visualization, language, and even cultural barriers In the case of multimedia information, both temporal and spatial issues must be dealt with
9. Transport
Moving of data/information from one location to another
Most common form: digital communication
Additional notes
For data: Mining in order to find useful patterns and correlations For text:
Conceptual representation Ontological classification of concepts
Analysis of Images
Extract features
Color Shape Texture Spatial relationships
Analysis of Video
Determine video segments by detecting scene cuts (Scene cut detection process) Select a representative frame for each segment Extract Spatial features :
color, texture, shape, and relative object positions
Represent each segment with an object that can be efficiently indexed by its features
Audio Analysis
Object Segmentation
Text Analysis
Keywords
Objects
Sketch
Spatial Relationships
Analysis of Audio
For Speech:
Textual information from speech (then sound retrieval becomes text retrieval) Speaker Information (identification)
For Music:
Rhythm Event Instrument
Analysis of data
The hardest task: Integration of data from multiple databases
Despite many years of work, we still have difficulty in this area
Current databases
Advanced Database Systems
(mid-1980s-present)
-Advanced data models: extended-relational,
object-oriented,
object-relational, deductive -Application-oriented: spatial, temporal, multimedia, active, scientific, knowledge bases
Data Integration
Data Warehousing and Data Mining (late 1980s-present) -Data warehouse and OLAP technology -Data mining and knowledge discovery Web-based Database Systems (1990s-present) -XML based database systems -Web mining
Data Collection and Database Creation (1960s and earlier) -Primitive file processing
Database Management Systems (1970s-early 1980s) -Hierarchical and network database systems -Relational database systems -Data modeling tools: entity-relationship model, etc. -Indexing and data organization techniques: B+ -tree, hashing etc. -Query languages: SQL, etc. -User interfaces, forms and reports -Query processing and query optimization -Transaction management: recovery, concurrency control,etc. -On-line transaction processing(OLTP)
Advanced Database Systems (mid-1980s-present) -Advanced data models: extended-relational, Data Warehousing and Data Mining
Web-based Database Systems (1990s-present) -XML based database systems -Web mining
object-oriented,
object-relational, deductive -Application-oriented: spatial, temporal, multimedia, active, scientific, knowledge bases
(late 1980s-present)
-Data warehouse and OLAP technology -Data mining and knowledge discovery
Data integration
Heterogeneity possible in any aspect
Data selection Data transformation Data mining and evaluation of patterns Presentation of knowledge
Data Mining
Patterns
Data Warehouse
Databases
Flat files
Provides data analysis capabilities, collectively known as On-Line Analytical Processing (OLAP) A number of pieces are needed: tools, gateways, and conversion routines
Clean
Data source in Location 2
Transform
Integrate Load
Data Warehouse
Client