 Anil Tatti
          aniltatti [at] simca [dot] ac [dot] in Send mail – will reply within 24 hours Tue – 8.15 a.m. to 9.30 a.m. Thursday / Friday – 3.15 p.m. to 4.30 p.m. Walk In Any time – You are Welcome If holiday- make it up in next week / available time slot Exam 70 % Assignment / Homework every week -25 % Attendenance – 5 % Copy – Fail

SIMCA 2009 Lecture 2


Information Systems
Why Do People Need Information?
 Individuals - Entertainment and enlightenment  Businesses - Decision making, problem solving and control

SIMCA 2009 Lecture 2


Data, Information, and Systems
 Data vs. Information
 Data
 A “given,” or fact; a number, a statement, or a picture  Represents something in the real world  The raw materials in the production of information

 Information
 Data that have meaning within a context  Data in relationships  Data after manipulation
SIMCA 2009 Lecture 2 3

Data, Information, and Systems
 Data Manipulation
 Example: customer survey
 Reading through data collected from a customer survey with questions in various categories would be time-consuming and not very helpful.  When manipulated, the surveys may provide useful information.

SIMCA 2009 Lecture 2


Data, Information, and Systems
 Generating Information
 Computer-based ISs take data as raw material, process it, and produce information as output.

Figure 1.1 Input-process-output

SIMCA 2009 Lecture 2


Data, Information, and Systems
 Information in Context

Figure 1.2 Characteristics of useful information
SIMCA 2009 Lecture 2 6

Data, Information, and Systems
 What Is a System?
 System: A set of components that work together to achieve a common goal  Subsystem: One part of a system where the products of more than one system are combined to reach an ultimate goal  Closed system: Stand-alone system that has no contact with other systems  Open system: System that interfaces with other systems

SIMCA 2009 Lecture 2


Data, Information, and Systems

Figure 1.3 Several subsystems make up this corporate accounting system.
SIMCA 2009 Lecture 2 8

Data, Information, and Systems
 Information and Managers
 Systems thinking
 Creates a framework for problem solving and decision making.  Keeps managers focused on overall goals and operations of business.

SIMCA 2009 Lecture 2


Data, Information, and Systems

Figure 1.5 Qualities of humans and computers that contribute to synergy

SIMCA 2009 Lecture 2


Data, Information, and Systems
 The Benefits of Human-Computer Synergy
 Synergy
 When combined resources produce output that exceeds the sum of the outputs of the same resources employed separately

 Allows human thought to be translated into efficient processing of large amounts of data

SIMCA 2009 Lecture 2


Data, Information, and Systems

Figure 1.6 Components of an information system
SIMCA 2009 Lecture 2 12

Data, Information, and Systems
 The Four Stages of Data Processing
 Input: Data is collected and entered into computer.  Data processing: Data is manipulated into information using mathematical, statistical, and other tools.  Output: Information is displayed or presented.  Storage: Data and information are maintained for later use.
SIMCA 2009 Lecture 2 13

Why Study IS?
 Information Systems Careers
 Systems analyst, specialist in enterprise resource planning (ERP), database administrator, telecommunications specialist, consulting, etc.

 Knowledge Workers
 Managers and non-managers  Employers seek computer-literate professionals who know how to use information technology.

 Computer Literacy Replacing Traditional Literacy
 Key to full participation in western society

SIMCA 2009 Lecture 2


Ethical and Societal Issues
The Not-So-Bright Side

 Consumer Privacy
 Organizations collect (and sometimes sell) huge amounts of data on individuals.

 Employee Privacy
 IT supports remote monitoring of employees, violating privacy and creating stress.

SIMCA 2009 Lecture 2


Ethical and Societal Issues
The Not-So-Bright Side

 Freedom of Speech
 IT increases opportunities for pornography, hate speech, intellectual property crime, an d other intrusions; prevention may abridge free speech.

 IT Professionalism
 No mandatory or enforced code of ethics for IT professionals--unlike other professions.

 Social Inequality
 Less than 20% of the world’s population have ever used a PC; less than 3% have Internet access.

SIMCA 2009 Lecture 2


MIS Components

Hardware Software

Backup data Restart job Virus scan


Procedures SIMCA 2009 Lecture 2


Management Information – Related Subsystems
 Information Technology (IT)  is any computer based tool that people use to work with information and support the information-processing needs of an organisation.  Includes Hardware, Software, Communications, networks, production automation, etc  Any ‘Kit’ concerned with the capture, storage, transmission, and presentation of information

SIMCA 2009 Lecture 2


Decision Support Systems (DSS) ,
 Computer system designed to provide assistance in determining and evaluating alternative courses of action.  A DSS
 (1) acquires data from the mass of routine transactions of a firm,  (2) analyzes it with advanced statistical techniques to extract meaningful information, and  (3) narrows down the range of choices by applying rules based on decision theory. Its objective is facilitation of 'what if' analysis and not replacement of a manager's judgment.

 Example: Decision Explorer from Banxia  Example: Analyitica from Lumina

SIMCA 2009 Lecture 2


Strategic Management Information Systems (SMIS)
 Systems considered critical to the current or future business competitiveness of an organisation  SMIS is a relative rather than an absolute term as one must assess the of a give organisation first before attaching the term SMIS to a technology  Example: A web service offering a product online could be considered strategic – i.e. Dell computers, Air online booking system.  Example: Business Process re-engineering modelling software

SIMCA 2009 Lecture 2


Geographic Information Systems (GIS)
 Business information over layed on Geographical Maps  Example: Google Earth shows Business locations, visitor attractions, etc in particular areas

SIMCA 2009 Lecture 2


Management Information – Related Subsystems
 Expert System (ES)  Also called a knowledge based system – is an Artificial Intelligence system that applies reasoning capabilities to reach a conclusion.  Expert systems are software systems which capture the knowledge and experience of “experts” in particular fields – Accounting, Medicine, Production Control, etc.  Expert Systems, through a series of carefully contrived questions to the user, can determine “What's wrong”, and “what to do”.  Example: Forensic accounting

SIMCA 2009 Lecture 2


Dashboard System (DS)(EIS)
 A dashboard is an Executive Information System user interface that (similar to an automobile’s dashboard) is designed to be easy to read. For example, a product might obtain information from the local operating system in a computer, from one or more applications that may be running, and from one or more remote sites on the Web and present it as though it all came from the same source.  Digital dashboards may be laid out to track the flows inherent in the business processes that they monitor. Graphically, users may see the high-level processes and then drill down into low level data.

SIMCA 2009 Lecture 2


Airline Dashboard System

SIMCA 2009 Lecture 2


Traditional / Classical Organisation


Condensed reports

VP Finance

VP Accounting



Analyze data

Layers of middle managers


Collect data

SIMCA 2009 Lecture 2


Pioneers of Traditional / Scientific Management
 5 Key Functions of Management - To Plan - To Organise - To Command - To Co-ordinate - To Control • Principles for Organisational Structure
- Unity of Command - Small Spans of Control - Line or Chain of Command - Division of Work - specialism - Delegate Authority & Retain Responsibility

SIMCA 2009 Lecture 2


Modern Criticisms of Classical Management

 Inhuman working conditions and poor industrial relations  Over-specialisation and restrictive work practices  Bureaucratic organisational structures – long chains of command  Inward- looking organisational structures  Closed Systems – run out of steam when not conscious of environmental influences

SIMCA 2009 Lecture 2


The Matrix Management
• • • • • • • • • Project Focussed Multi-disciplinary teams Team members have more than one boss Project team disbanded when project completes New project team for new project Gives team members an insight into the workings of other departments Leadership training ground Allows people with ideas to carry them forward May cause blurring of communication lines

SIMCA 2009 Lecture 2

Modern Organisation structure
Bank Partner

Customer Partner

C.E.O. Fin
Supplier Partner





Legal Partner






Contractor Partner

Distribution Partner

SIMCA 2009 Lecture 2


New structure - Decentralised
Management Team
Dir Fin Dir Mrkt Dir Acct Dir HRM Dir MIS

Finance Team Marketing Team Accounting Team HRM Team

Corporate Database & Network

Sales Team




SIMCA 2009 Lecture 2


Business Trends
 Changing business environment  Specialization  Management by Methodology and Franchises  Mergers  Decentralization and Small Business  Temporary Workers  Internationalization  Service-Oriented Business  Re-engineering  Recession

 Need for faster responses and flexibility  MIS reflecting these requirements -

SIMCA 2009 Lecture 2


Business Trends & Implications
 Specialisation
 Increased demand for technical skills  Specialized MIS tools  Increased communication

 Methodology & Franchises
     Reduction of middle management Increased data sharing Increased analysis by top management Computer support for rules Re-engineering

 Mergers
 Larger companies  Need for control and information  Economies of scale

 Decentralization & Small Business
 Communication needs  Lower cost of management tasks  Low maintenance technology

SIMCA 2009 Lecture 2


Business Trends & Implications
 Temporary Workers  Managing through rules  Finding and evaluating workers  Coordination and control  Personal advancement through technology  Security  Internationalization  Communication  Product design  System development and programming  Sales and marketing  Service Orientation  Management jobs are information jobs  Customer service requires better information  Speed
SIMCA 2009 Lecture 2 33

Business Trend

Implications for Technology
•Increased demand for technical skills •Specialized MIS tools •Increased communication •Reduction of middle management •Increased data sharing •Increased analysis by top management •Computer support for rules •Re-engineering •Four or five big firms dominate most industries •Need for communication •Strategic ties to customers and suppliers •Communication needs •Lower cost of management tasks •Low maintenance technology •Managing through rules •Finding and evaluating workers •Coordination and control •Personal advancement through technology •Security •Communication •Product design •System development and programming •Sales and marketing •Management jobs are information jobs •Customer service requires better information •Speed

Methodology & Franchises


Decentralization & Small Business Temporary Workers


Service Orientation

SIMCA 2009 Lecture 2


Management Information Systems (MIS)
 Management information system (MIS)
 An MIS provides managers with information and support for effective decision making, and provides feedback on daily operations  Output, or reports, are usually generated through accumulation of transaction processing data  Each MIS is an integrated collection of subsystems, which are typically organized along functional lines within an organization

SIMCA 2009 Lecture 2


Sources of Management Information

SIMCA 2009 Lecture 2 36

Employees Corporate databases of internal data Databases of external data Corporate intranet Decision support systems Executive support systems

Business transactions

Transaction processing systems

Databases of valid transactions

Management information systems

Application databases

Operational databases

Drill-down reports Exception reports Demand reports Key-indicator reports

Expert systems

Input and error list

Scheduled reports

SIMCA 2009 Lecture 2

Outputs of a Management Information System
 Scheduled reports
 Produced periodically, or on a schedule (daily, weekly, monthly)

 Key-indicator report
 Summarizes the previous day’s critical activities  Typically available at the beginning of each day

 Demand report
 Gives certain information at a manager’s request

 Exception report
 Automatically produced when a situation is unusual or requires management action

SIMCA 2009 Lecture 2


Scheduled Report Example
Daily Sales Detail Report Prepared: 08/10/xx Order # P12453 P12453 P12453 P12455 P12456 Customer ID C89321 C89321 C03214 C52313 C34123 Sales Rep ID CAR CAR GWA SAK JMW Ship Date 08/12/96 08/12/96 08/13/96 08/12/96 08J/13/96

Quantity 144 288 12 24 144

Item # P1234 P3214 P4902 P4012 P3214

Amount $3,214 $5,660 $1,224 $2,448 $720

SIMCA 2009 Lecture 2


Key Indicator Report Example
Daily Sales Key Indicator Report This Month Total Orders Month to Date Forecasted Sales for the Month $1,808 $2,406 Last Month $1,694 $2,224 Last Year $1,014 $2,608

SIMCA 2009 Lecture 2


Demand Report Example
Daily Sales by Sales Rep Summary Report Prepared: 08/10/xx Sales Rep ID CAR GWA SAK JWN Amount $42,345 $38,950 $22,100 $12,350

SIMCA 2009 Lecture 2


Exception Report Example
Daily Sales Exception Report – ORDERS OVER $10,000 Prepared: 08/10/xx Order # P12453 P12453 P12453 … … Customer ID C89321 C89321 C03214 … … Sales Rep ID CAR CAR GWA … … Ship Date 08/12/96 08/12/96 08/13/96 … …

Quantity 144 288 12 … …

Item # P1234 P3214 P4902 … …

Amount $13,214 $15,660 $11,224 … …

SIMCA 2009 Lecture 2


Outputs of a Management Information System
Earnings by Quarter (Millions) Actual Forecast $11.8 $10.7 $14.5 $13.3 Variance 6.8% 0.9% -1.4% -3.0%

Drill Down Reports Provide detailed data about a situation.

2ND Qtr 1999 1st Qtr 1999 4th Qtr 1998 3rd Qtr 1998

$12.6 $10.8 $14.3 $12.8

Etc. See Figure 9.2

SIMCA 2009 Lecture 2


Characteristics of a Management Information System
 Provides reports with fixed and standard formats
 Hard-copy and soft-copy reports

 Uses internal data stored in the computer system  End users can develop custom reports  Requires formal requests from users

SIMCA 2009 Lecture 2


Management Information Systems for Competitive Advantage
 Provides support to managers as they work to achieve corporate goals  Enables managers to compare results to established company goals and identify problem areas and opportunities for improvement

SIMCA 2009 Lecture 2


MIS and Web Technology
 Data may be made available from management information systems on a company’s intranet  Employees can use browsers and their PC to gain access to the data

SIMCA 2009 Lecture 2


Functional Aspects
 MIS is an integrated collection of functional information systems, each supporting particular functional areas.

SIMCA 2009 Lecture 2 47

Internet Internet

An Organization’s MIS Financial MIS

Business transactions

Transaction processing systems

Databases of valid transactions

Accounting MIS

Drill down reports Exception reports Demand reports

Marketing MIS

Key-indicator reports Scheduled reports

Business transactions

Databases of external data

Human Resources MIS Etc.


Extranet Extranet

Figure 9.3
SIMCA 2009 Lecture 2 48

Financial MIS
 Provides financial information to all financial managers within an organization.

SIMCA 2009 Lecture 2 49

Databases of internal data

Databases of external data

Financial DSS

Business transactions Transaction processing systems
Databases of valid transactions for each TPS

Financial MIS

Financial applications databases

Business transactions Operational databases

Financial statements Uses and management of funds Financial statistics for control

Internet or Internet or Extranet Extranet

Financial ES

Business transactions

Customers, Suppliers
SIMCA 2009 Lecture 2

Figure 9.3

Inputs to the Financial Information System
 Strategic plan or corporate policies
 Contains major financial objectives and often projects financial needs.

 Transaction processing system (TPS)
 Important financial information collected from almost every TPS payroll, inventory control, order processing, accounts payable, accounts receivable, general ledger.  External sources  Annual reports and financial statements of competitors and general news items.

SIMCA 2009 Lecture 2


Financial MIS Subsystems and Outputs
 Financial subsystems
     Profit/loss and cost systems Auditing Internal auditing External auditing Uses and management of funds

SIMCA 2009 Lecture 2


Manufacturing MIS

SIMCA 2009 Lecture 2 53

Databases of internal data

Databases of external data

Manufacturing DSS

Business transactions Transaction processing systems
Databases of valid transactions for each TPS

Manufacturing MIS

Manufacturing applications databases

Business transactions Operational databases

Quality control reports Process control reports

Internet or Internet or Extranet Extranet

JIT reports MRP reports Production schedule CAD output

Manufacturing ES

Business transactions

Customers, Suppliers
SIMCA 2009 Lecture 2

Figure 9.6

Inputs to the Manufacturing MIS
 Strategic plan or corporate policies.  The TPS:
     Order processing Inventory data Receiving and inspecting data Personnel data Production process

 External sources

SIMCA 2009 Lecture 2


Manufacturing MIS Subsystems and Outputs
        Design and engineering Master production scheduling Inventory control Manufacturing resource planning Just-in-time inventory and manufacturing Process control Computer-integrated manufacturing (CIM) Quality control and testing

SIMCA 2009 Lecture 2


Marketing MIS
 Supports managerial activities in product development, distribution, pricing decisions, and promotional effectiveness

SIMCA 2009 Lecture 2 57

Databases of internal data

Databases of external data

Manufacturing DSS

Business transactions

Transaction processing systems

Databases of valid transactions for each TPS

Marketing MIS

Marketing applications databases

Sales by customer Sales by salesperson

Operational databases

Sales by product Pricing report Total service calls Customer satisfaction

Manufacturing ES

Figure 9.9
SIMCA 2009 Lecture 2 58

Inputs to Marketing MIS
 Strategic plan and corporate policies  The TPS  External sources:
 The competition  The market

SIMCA 2009 Lecture 2


Marketing MIS Subsystems and Outputs
    Marketing research Product development Promotion and advertising Product pricing

SIMCA 2009 Lecture 2


Human Resource MIS
 Concerned with all of the activities related to employees and potential employees of the organization

SIMCA 2009 Lecture 2


Databases of internal data

Databases of external data

Manufacturing DSS

Business transactions

Transaction processing systems

Databases of valid transactions for each TPS

Human Resource MIS
Benefit reports Salary surveys

Human resource applications databases

Operational databases

Scheduling reports Training test scores Job applicant profiles Needs and planning reports

Manufacturing ES

Figure 9.12
SIMCA 2009 Lecture 2 62

Inputs to the Human Resource MIS
 Strategic plan or corporate policies  The TPS:
 Payroll data  Order processing data  Personnel data

 External sources

SIMCA 2009 Lecture 2


Human Resource MIS Subsystems and Outputs
     Human resource planning Personnel selection and recruiting Training and skills inventory Scheduling and job placement Wage and salary administration

SIMCA 2009 Lecture 2


Other MIS
 Accounting MISs
 Provides aggregated information on accounts payable, accounts receivable, payroll, and other applications.

 Geographic information systems (GIS)
 Enables managers to pair pre-drawn maps or map outlines with tabular data to describe aspects of a particular geographic region.

SIMCA 2009 Lecture 2


MIS & Related Organisational Functions
Strategic Management:
Provides an organisation with overall direction and guidance – mission and vision

ER ES P Tr an DSS Pr oc sa es cti sC on on tro l

Strategic Mgmt

Tactical Management:
Develops the goals and strategies outlined by Strategic Management

Tactical Management

Operational Management:
Manages and directs the day-to-day operations and implementations of the goals and strategies

Operational Mgmt

Non – Management employees:
Producing goods and services – serving customers, order processing SIMCA 2009 Lecture 2 66

SIMCA 2009 Lecture 2


What is MIS?
 Is a system which gives us the  Right information  To the right person  At the right place  At the right time  In the right form  At the right cost

SIMCA 2009 Lecture 2


 Why is it necessary
 Increased Business and Management complexities

 Who is a Good Manager
 One who minimizes / eliminates the elements of risk & uncertainty.

 Response Simulator
 Enables a decision maker to give either a reactive or proactive response  May be futuristic.

SIMCA 2009 Lecture 2


Characteristics- Sub Systems
 Marketing
 Sales Forecasting , Sales Planning, Customer & Sales Analysis.

 Manufacturing
 Production Planning, scheduling, cost control analysis.

 Logistics
 Planning & Control of purchasing, inventories, distribution

 Personnel
 Planning Personnel requirements , Analyzing performance, salary administration.

SIMCA 2009 Lecture 2


Finance & Accounting
Financial analysis, cost analysis, capital requirements, planning, income measurement.

Information Processing
Information system planning , Cost – Benefit analysis.

Top Management
Strategic Planning, resource allocation.

SIMCA 2009 Lecture 2


Activity Sub-Systems
 Transaction Processing
 Processing of Orders, shipments & receipts.

 Operational Control
 Scheduling of activities & performance receipts.

 Management Control
 Formulation of budgets & resource allocation.

 Strategic Planning
 Formulation of objectives & strategic plans.

SIMCA 2009 Lecture 2


Users & Characteristics
Type of System ESS/EIS DSS Information Inputs Processing Information Outputs
Projections; response to Queries Special reports; decision analysis; response to Queries Summary & exception reports

Senior Managers Professionals; Staff Managers Middle Managers

Aggregate data , external , Graphics; simulations, internal interactive Low- Volume data, analyticInteractive; simulations, models analysis Summary Transaction Routine reports; simple data; high volume models; low level data; simple models analysis Design Specializations, Knowledge base Documents, schedules Modeling, simulations



Models, Graphics

Professionals; Technical Staff


Document; management; scheduling; communication Sorting; listing; merging; updating

Documents; schedules; mail Clerical Workers


Transactions; events

Detailed reports; list summaries

Operations; Personnel; Supervisors

SIMCA 2009 Lecture 2


MIS Requirements
        Unified system Should support / facilitate decisions Should be compatible with the organisation’s structure & culture Should be cost effective / beneficial Should be responsive to changes around & within the organisation. Should be speedy & accurate Should provide validated & valid information Should be Management & not Manipulated Information system.

SIMCA 2009 Lecture 2


 Technical Approach
 Based on Mathematical & normative models  Relies heavily on physical technology – CS , MS, OR

 Behavioral Approach
 Behavioral impact / response of people – Political Science, Psychology, Sociology & organisational Behavior.

 Socio-Technical Approach
 Borrows from both the above approaches.

SIMCA 2009 Lecture 2


Why is it Important for Managers Today to Consider the Strategic Role of Information Systems?

Strategic Advantage and IT
 Important Managerial Questions
    What is strategy? What is strategic advantage? Information Systems as a strategic resource How do we use Information Systems to achieve some form of strategic advantage over competitors?

SIMCA 2009 Lecture 2

What is Strategy?
Strategy Definitions
 Strategy  A plan  Early 1990s definition:  “A well coordinated set of objectives, policies, and plans aimed at securing a long-term competitive advantage. A vision for the organization that is implemented.”  Webster’s Dictionary  “a careful plan or method”  “the art of devising or employing plans toward a goal”  “the art and science of military command exercised to meet the enemy in combat under advantageous circumstances”

SIMCA 2009 Lecture 2


What is Strategy?
Strategy Definitions
 Strategy  Henry Mintzberg:  Explicitly planned: “Intended Strategy”  Realized: planned and succeed  Unrealized: planned but fail  Implicit, not explicitly planned yet executed: “Emergent Strategy”

Planned Strategy

Executed Strategy

Failed Emergent SIMCA Strategy 2009 Lecture 2 Strategy


Strategic Advantage and IT
Evolution of Strategy Concepts
 Competitive Strategy
 Competitive Advantage

Strategy Speeding Up

Sustainable Competitive Advantage  defensible market position (CQFDS), unique core competence  long-term barriers to competition, non-competitive profits (>0) Temporary (Non-Sustainable) Competitive Advantage Sustainable Strategic Advantage  long-term, dominant strategy, strategic systems, strategic structural changes Temporary Strategic Advantage

 Strategic Advantage

Leverageable Strategic Advantage (Carr) » dominant strategy is only a stepping-stone to future dominant strategies
SIMCA 2009 Lecture 2

Strategic Advantage and IT
Evolution of Strategy Concepts
 Venkatraman (BU) and Subramaniam (BC Prof.)
 Three eras of approaches for achieving strategic advantage
 Portfolio of Business (1970s)  performance a result of businesses you pick to be in  motivated by economies of scale  Portfolio of Capabilities (mid 1980s)  performance a result of internal processes and routines, which provide distinctive capabilities  motivated by economies of scale and scope  Portfolio of Relationships (mid 1990s)  performance a result of building a wide array of relationships with external companies that possess hard-to-imitate capabilities  motivated by economies of scale, scope, and expertise

SIMCA 2009 Lecture 2


Information Systems as a Strategic Resource
 Inwardly Strategic
 focused on internal processes
 

 Outwardly Strategic
 aimed at direct competition  beat competitors
 

 

lower costs increase employee productivity improve teamwork enhance communication

new services new “knowledge” that leads to new services

SIMCA 2009 Lecture 2

Information Systems as a Strategic Resource
 Hayes and Wheelwright (1985) - operations effectiveness, applies equally well to ISD effectiveness
 Stage 1: Internally Neutral
  

not seen as a source of process improvement technology Minimize negative impact of functional area on organization Top management “in control”; tells dept. what to do not seen as a source of external competitive advantage source of internally focused competitive advantages viewed as competitive force in the business function drives issues of top-management strategy making
SIMCA 2009 Lecture 2

 Stage 2: Externally Neutral

 Stage 3: Internally Supportive

 Stage 4: Externally Supportive
 

Information Systems as a Strategic Resource
Competitive Marketplace
Externally Strategic

Company A
Internally Strategic Inter-Firm Strategic Focus SIMCA 2009 Lecture 2 “Alliance”

Company B

Elements of Strategic Management
 Innovation  Response-Management  Long-Range Planning
 Competitive Intelligence

SIMCA 2009 Lecture 2

Model #1:
Porter’s Competitive Forces Model
 Threat of new competitors  Bargaining power of suppliers  Bargaining power of customers  Threat of substitute products or services  Rivalry among existing firms

SIMCA 2009 Lecture 2

Model #1:
Porter’s Competitive Forces Model - “Generic Response Strategies” Market Size  Cost leadership Niche Broad  Differentiation  Focus Cost Focus Cost  Other dimensions …
 Strategic positioning  Customer service  Operational Effectiveness

Strategic Advantage Diff. Focus Diff.

Cost, Quality, Flexibility, Delivery

SIMCA 2009 Lecture 2

Model #1: Use of Porter’s Model
 List players  Analyze business drivers  Devise a strategy  Investigate supportive information technologies

SIMCA 2009 Lecture 2

Models for Understanding the Value Creation Process

Model #2:
Porter’s Value Chain Analysis Model
Porter’s “Value Chain”
Firm Infrastructure Human Resources Management Technology Development Procurement Profit Margin Outbound Logistics Marketing & Sales Service

Inbound Logistics


SIMCA 2009 Lecture 2


Model #2:
Porter’s Value Chain Analysis Model - Primary Activities  Inbound logistics  Operations  Outbound logistics  Marketing / sales  Service

SIMCA 2009 Lecture 2

Model #2:
Porter’s Value Chain Analysis Model - Support Activities  Firm infrastructure  Human resource management  Technology department  Procurement

SIMCA 2009 Lecture 2

Model #3:
Porter and Millar Five-Step Process
 Assess information intensity (note: quite subjective)
 High … implies strategic opportunities exist
 

customers need a lot of information to understand and/or use a product suppliers dependent on information

 Determine the role of IT in the industry structure  Identify and rank the ways in which IT can create competitive advantage  Investigate how IT might spawn new businesses  Develop a plan for taking advantage of IT

SIMCA 2009 Lecture 2

SIMCA 2009 Lecture 2


Value Web

SIMCA 2009 Lecture 2



SIMCA 2009 Lecture 2


 Strategic Information System Applications
       Cost leadership Differentiation Growth Alliances Innovation Improve internal efficiency Customer-oriented approaches

SIMCA 2009 Lecture 2

Functional Use Of MIS
        To lower cost in all parts of Value chain Facilitate product delivery Adding value to quality Transform physical processing component into information component Speed / Ability – Competitive Advantage Quality Enhancement Simplification – Product , Process, Cycle Time Organisation – Benchmark , Customer Service, Precision etc

SIMCA 2009 Lecture 2


Strategic Use
 Out perform rivals  Product differentiation  Focussed differentiation  Right linkages to customers & suppliers  Low cost Product  Precise development of strategies, planning , forecasting & monitoring.  Problem Solving / Decision making

SIMCA 2009 Lecture 2


Strategic Uses Contd….
 Coordinate activities globally  Think Globally, act Locally  Competitive Advantage  More Flexible & Responsive  Flexibility

SIMCA 2009 Lecture 2


MIS - Organisation & Change

SIMCA 2009 Lecture 2


Why Firms Seek Competitive Advantage (Porter’s Five-Force Model):
• • • • • Rivalry among existing competitors Threat of new entrants Threat of substitute product and services Bargaining power of buyers Bargaining power of suppliers

SIMCA 2009 Lecture 2


Competitive Forces Model

SIMCA 2009 Lecture 2


Information Systems for Competitive Advantage
 Businesses continually seek to establish competitive advantage in the marketplace.  There are eight principles:  The first three principles concern products.  The second three principles concern the creation of barriers.  The last two principles concern establishing alliances and reducing costs.

SIMCA 2009 Lecture 2


Organizational Change
 Organizational change deals with how organizations plan for, implement and handle change. Overcoming resistance to change can be the hardest part of bringing information systems into a business. Too many computer systems and new technologies have failed because managers and employees were not prepared for change.  A change model identifies the phases of change and the best way to implement it:
 Unfreezing is the process of removing old habits and creating a climate receptive to change  Moving is the process of learning new work methods, behaviors and systems  Refreezing involves reinforcing changes to make the new process second nature, accepted and part of the job

SIMCA 2009 Lecture 2


Internet Business Models

SIMCA 2009 Lecture 2


Internet Business Models

SIMCA 2009 Lecture 2


 IT  Networks  Database Management Systems  Data Mining  Mid term – Next Sunday – 10 to 12 pm. -50 marks – Counted as internal – Portion till Friday 10/04/2009

SIMCA 2009 Lecture 2


Information Technology
 -is the acquisition, processing, storage and dissemination of vocal, pictorial, textual & numeric information by a micro-electronics based combination of computing & telecommunications.  -used to describe technologies which enable the users to record ,store, process, transmit & receive information.

SIMCA 2009 Lecture 2


IT Capabilities
 Transactional – transform unstructured process into routine transactions  Geographical-overcome distance barrier  Automation – reduce human labour  Informational – huge amounts of data  Sequential – Sequence / Multiple Sequence  Knowledge Management- allows capture /dissemination of knowledge & expertise to improve a process  Tracking- of task status , inputs & outputs  Disintermediation – connect 2 parties without an intermediary.

SIMCA 2009 Lecture 2


 Evolution  Hardware
     CPU -ALU Input Output Storage – Pri /Sec Media / Communication devices

 Software
 System Software – OS, Complier – Diff OS  Application Software- Concerned with accomplishing task of end users.

SIMCA 2009 Lecture 2


SIMCA 2009 Lecture 2


Data Processing
 Data- What is data?  Bits , Bytes ,Character , Field, Record , Blocks, File , Database  Activity – Read , Sort , Write, Merge, Delete, Store, Compare, Collate, Decide, Display , Print, Copy, Compute, Plot, Transfer, Create , Perform  Operations
          Capturing data from an event. Transaction Verifying for correctness Classifying into specific categories Sorting – placing data in a particular sequence Summarizing- aggregating data elements Calculating- Arithmetic / Logic operations Storing in a media Retrieving – searching & gaining access to specific data elements Reproducing from one medium to another Communicating from one place to another.

SIMCA 2009 Lecture 2


Data Processing Hierarchy
 Electronic Data Processing- transactions occurring due to day to day activities  Office Automation Systems – for performing office routines  TPS – capturing, classifying, storing , maintaining , updating & retrieving data  MIS – provide information for decision makers  DSS- interactive system to support operations & decision making @ strategic / tactical levels  EIS / ESS – combines data from both internal & external sources to be applied to a changing array of problems  Knowledge based / Expert Systems – based on rules of thumb or heuristic knowledge intuition, judgment & inferences 

SIMCA 2009 Lecture 2


Transaction Processing
 Has relevance for 3 reasons
 Information  Action  Investigational

 Validation Tests
       Missing data Valid Size Class / Composition Range or Reasonableness Invalid Value Comparison with Stored data Check Digit

SIMCA 2009 Lecture 2


TP Controls
 Audit Trial- Tracing  Pre-Numbered Source Document- to ensure sequential processing.  Document Produced as a byproduct of Transaction- use of credit card.  Control Report- Summary -cash register  Anticipation Report- waiting for certain event to occur & then scheduling other transactions.

SIMCA 2009 Lecture 2


 Batch Processing  Online  Real Time  Distributed Processing  Time Sharing  Multi Programming  Multi Processing

SIMCA 2009 Lecture 2


Data Transmission
     Transmitter Converter @ transmitting end Transmission Channel Convertor @ receiving end Receiver of Transmitted Channel

 Universal Seven Part Data Circuit
       DTE DTE / DCE Interface DCE Transmission Channel DCE DCE /DTE Interface DTE

SIMCA 2009 Lecture 2


Transmission Process
    Analog / Digital Signal Modem Multiplexer / Demultiplexer Channels
 Physical Line
 Twisted Pair  Coaxial Cable  Optical Fibre

 Micro Wave
 Tower – LOS  Radio / Wireless  Satellite

SIMCA 2009 Lecture 2


 Transmission Speed- bps  Bandwidth - Capacity  Transmission mode –
 Synchronous  Asynchronous

 Transmission Direction
 Simplex  Duplex  Half Duplex

SIMCA 2009 Lecture 2


Traditional File Processing

SIMCA 2009 Lecture 2


 Data Redundancy & Inconsistency  Program Data Dependence  Lack of Flexibility  Poor Security  Lack of Data Sharing & Availability

SIMCA 2009 Lecture 2


Contemporary Database Systems

SIMCA 2009 Lecture 2



SIMCA 2009 Lecture 2


Hierarchical Database

SIMCA 2009 Lecture 2


Network Model

SIMCA 2009 Lecture 2



SIMCA 2009 Lecture 2


Data Warehouse

SIMCA 2009 Lecture 2


Hypermedia database

SIMCA 2009 Lecture 2


Web Linkage

SIMCA 2009 Lecture 2


Components of a Network

SIMCA 2009 Lecture 2


 Node  Access Path  Protocol  File Server  Network Operating System

SIMCA 2009 Lecture 2


SIMCA 2009 Lecture 2


4 layered model

SIMCA 2009 Lecture 2


 What is a topology  Terminology  Different technologies

SIMCA 2009 Lecture 2


Why Network Computers?
 To share files  To share hardware  To share programs  User communication

SIMCA 2009 Lecture 2


  

Terminology devices, such as Networking – consists of computers, wiring, and other
hubs, switches, and routers that make up the network infrastructure. Topology – (from the Greek word topos meaning place) is a description of any kind of locality in terms of its layout. There are two ways to describe a network topology. 1. Physical topology 2. Logical Topology

SIMCA 2009 Lecture 2


Client – a computer that allows a user to log onto the network and take advantages of the resources on the network. Server – Much more powerful computer that provides centralized administration of the network and serves up the resources that are available on the network.

SIMCA 2009 Lecture 2


Client/Server network operating systems allow the network to centralize functions and applications in one or more file servers

Advantages  Centralized  Scalable  Flexible  Interoperable  Accessible
SIMCA 2009 Lecture 2

Disadvantages • Maintenance • Expense • Dependence

Peer to Peer
Each computer acts both as a client and server. Advantages  Less expense  Easy setup  Decentralized Disadvantages  Security  Decentralized

SIMCA 2009 Lecture 2


Standard Physical Topologies Bus

Star Ring Mesh
SIMCA 2009 Lecture 2 141

Bus Topology

 Characterized by a main trunk or backbone line with networked computers attached at intervals along the trunk line.  Passive topology  Typically use coaxial cable hooked to each computer using a T-connector.

SIMCA 2009 Lecture 2


Bus Topology cont.

Coaxial Cable


SIMCA 2009 Lecture 2


Star Topology
Computers on the network connect to a centralized connectivity device, usually a hub or a switch.

SIMCA 2009 Lecture 2


Ring Topology

 Connects the LAN computers one after the other on the wire in a physical circle.  Moves info on the wire in one direction, considered an active topology.

SIMCA 2009 Lecture 2


Mesh Topology
 All nodes are directly connected with all other nodes.  Best choice when fault tolerance is required.  Very difficult to setup and maintain.

SIMCA 2009 Lecture 2


Standard Logical Topologies
 The way in which data accesses the medium (cable) and transmits packets.

 There are only two: Ring and Bus

SIMCA 2009 Lecture 2


Logical Topology: Ring
In the ring logical topology only one node can send information across the network at any given time. This is done by way of a ‘token’. Each terminal receives this special packet, and if it has data to send, it will do so. Once it has sent the data, it passes the token to the next station.  Used for very fast networks  No collisions  Susceptible to faults

SIMCA 2009 Lecture 2


Each time a node on a network has data for another node the sending node broadcasts to the entire network.
 Stations can always transmit.  Less susceptible to breaks.  Collisions (two stations transmitting at once) have to be dealt with.

SIMCA 2009 Lecture 2


Needs: Do you need very high speeds? Will you be moving really large files?

Selecting a Topology

Geography: How far is it between stations? Will you be relocating stations often? Maintenance: Do you want something (relatively) painless? Cost: Are you on a budget? Do you want replacement partsLecture 2 accessible? SIMCA 2009 easily


Domain Name System

SIMCA 2009 Lecture 2


Internet Network Architecture

SIMCA 2009 Lecture 2


Types of Network
 Lan- Within buildings / campuses
    Controlled, Maintained & Operated by end users High transmission hence high data & high speed. Share costly hardware & software Promote productivity as direct communication is possible
 Lan Interconnection  Bulk Data Transfer  Compressed Video

 MAN- Metropolitan Area Network\

 Backbone Network  WAN – Wide area Network  VAN- Value Added Network

SIMCA 2009 Lecture 2


Key issues in implementation
 Human Factors  Cost  Security  Reliability  Network Management  Compatibility with current / future networks.

SIMCA 2009 Lecture 2


Open System Interconnect (OSI)
 Application
 End uder Applications –File transfers / Remote access , Email etc

 Presentation
 Various data formats. Data conversions , encryption

 Session
 Manages dialogues / sessions

 Transport
 Reliable end to end transport of data , error recovery , flow control

 Network
 Establish , maintain , terminate n/w connection , routing

 Datalink
 Procedures & protocols for communication lines , error correction

 Physical
 Physical means of sending data over lines. Electrical / Mechanical & functional control of data circuits.

SIMCA 2009 Lecture 2


 Transmission Control Protocol

SIMCA 2009 Lecture 2


Internet Capabilities
      Email –messaging , document sharing Usenet Networking – Discussion groups, electronic boards. Chatting - Conversation Telnet – Remote Login Gophers- Locate Textual info using a hierarchy of menus. Archie – Search database of documents .s/w & data available for downloading.  Wide Area Info. Services – Locate files in database using keywords  WWW – Retrieve , format & display information.

SIMCA 2009 Lecture 2


Pros & Cons
      Reducing Communication Costs Enhancing communication & coordination Accelerating the distribution of knowledge Improving Customer Service Facilitating marketing & sales Disadvantages
     Security Technology Problems Lack of Standards Legal Issues Traditional Internet culture

SIMCA 2009 Lecture 2


Intranet / Extranet
 Not internet but the application of internet technologies to the internal corporate network  Extranet – semi private – specifically designed for a very select group of users /audience – e.g.. Company’s suppliers or business associates

SIMCA 2009 Lecture 2


Integrated Services Digital Network
 ISDN  Standard for transmitting voice data, image & video support over public telephone lines  Integrate all current & emerging technologies into a single world wide network  Allows user to
      Achieve convenience Flexibility Economy Lower power consumption Easy Maintenance Clarity , accuracy & speed

SIMCA 2009 Lecture 2


IT Enabled Services (ITES)
 Offering of services from remote location
         BPO Call Centers Medical Transcription Animation Back Office Legal database Market Research Remote Education Website Services

SIMCA 2009 Lecture 2


 Measurement of natural & human made phenomenon & processes from a spatial perspective with emphasis on 3 properties
 Elements  Attributes  Relationship

 Storage of measurements
 Points  Lines  Areas / Polygons.

 Analysis of collected measurements to produce more data & discover new relationships  Depiction of measured / analyzed data in some type of display
    Maps Lists Graphs Summary statistics

SIMCA 2009 Lecture 2


GIS Applications
              Advertising Archeology Education Cartography Site Selection Election Administration Insurance Routing / Distribution Network Oil, Gas & Mineral exploration Wild Life Government Agencies – Police Transportation & Logistics Urban & Regional Planning Emergency Response Planning

SIMCA 2009 Lecture 2


Why Outsource?
 MNC’s can save costs  Increase revenue  Conserve capital  Greater efficiency due to increased speed  Rapidly improving Infrastructure  Declining telecom costs  Foster innovation  Improve Quality

SIMCA 2009 Lecture 2


What Is a DBMS?
 A very large, integrated collection of data.  Models real-world enterprise.
 Entities (e.g., students, courses)  Relationships (e.g., Madonna is taking CS564)

 A Database Management System (DBMS) is a software package designed to store and manage databases.

SIMCA 2009 Lecture 2


Why Use a DBMS?
 Data independence and efficient access.  Reduced application development time.  Data integrity and security.  Uniform data administration.  Concurrent access, recovery from crashes.

SIMCA 2009 Lecture 2



Why Study Databases??
 Shift from computation to information
 

at the “low end”: scramble to web space (a mess!) at the “high end”: scientific applications Digital libraries, interactive video, Human Genome project, EOS project ... need for DBMS exploding OS, languages, theory, “A”I, multimedia, logic

 Datasets increasing in diversity and volume.
 

 DBMS encompasses most of CS

SIMCA 2009 Lecture 2


 A data model is a collection of concepts for describing data.  A schema is a description of a particular collection of data, using the a given data model.  The relational model of data is the most widely used model today.
 

Data Models

Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields.

SIMCA 2009 Lecture 2


Levels of Abstraction
 Many views, single conceptual (logical) schema and physical schema.
 Views describe how users see the data.  Conceptual schema defines logical structure  Physical schema describes the files and indexes used.

View 1 View 2 View 3 Conceptual Schema Physical Schema 

Schemas are defined using DDL; data is modified/queried using DML
SIMCA 2009 Lecture 2 169

Example: University Database
 Conceptual schema:
  

Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Relations stored as unordered files. Index on first column of Students. Course_info(cid:string,enrollment:integer)

 Physical schema:
 

 External Schema (View):

SIMCA 2009 Lecture 2


Data Independence
 Applications insulated from how data is structured and stored.  Logical data independence: Protection from changes in logical structure of data.  Physical data independence: Protection from changes in physical structure of data. 

One of the most important benefits of using a DBMS!
SIMCA 2009 Lecture 2 171

 Atomicity - to guarantee that either all of the tasks of a transaction are performed or none of them are.  Consistency - ensures that the database remains in a consistent state before the start of the transaction and after the transaction is over .  Isolation - that other operations cannot access or see the data in an intermediate state during a transaction  Durability - guarantee that once the user has been notified of success, the transaction will persist, and not be undone.

SIMCA 2009 Lecture 2


The Log
 The following actions are recorded in the log:
 Ti writes an object: the old value and the new value.
 Log record must go to disk before the changed page!

 Ti commits/aborts: a log record indicating this action.

 Log records chained together by Xact id, so it’s easy to undo a specific Xact (e.g., to resolve a deadlock).  Log is often duplexed and archived on “stable” storage.  All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS.
SIMCA 2009 Lecture 2 173

Databases make these folks happy ...
 End users and DBMS vendors  DB application programmers
 E.g. smart webmasters

 Database administrator (DBA)
    Designs logical /physical schemas Handles security and authorization Data availability, crash recovery Database tuning as needs evolve

Must understand how a DBMS works!
SIMCA 2009 Lecture 2 174

Structure of a DBMS
 A typical DBMS has a layered architecture.  The figure does not show the concurrency control and recovery components.  This is one of several possible architectures; each system has its own variations. Query Optimization and Execution Relational Operators

These layers must consider concurrency control and recovery

Files and Access Methods Buffer Management Disk Space Management

SIMCA 2009 Lecture 2 175

Summary  DBMS used to maintain, query large datasets.
 Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security.  Levels of abstraction give data independence.  A DBMS typically has a layered architecture.  ZZZZZZZZZZZZZ……..  DBMS R&D is an exciting area.

SIMCA 2009 Lecture 2


State of Art in Databases
 Expanding domain of databases:
     Spatial Data Timeseries Data Text Data Music, Video, … Data Streams.

 Internet evolution and databases:
 yahoo!, Google, expedia, B2B, P2P, B2C,...

 Performance and Tuning!!!  Future: Sensor networks.

SIMCA 2009 Lecture 2


DBMS Components
 Data Definition Language – DDL  Data Manipulation Language – DML  Data Dictionary

SIMCA 2009 Lecture 2


Objectives of Today’s Businesses
 Access and combine data from a variety of data stores  Perform complex data analysis across these date stores  Create multidimensional views of data and its metadata  Easily summarize and roll up the information across subject areas and business dimensions

SIMCA 2009 Lecture 2


These objectives cannot be met easily
 Data is scattered in many types of incompatible structures.  Lack of documentation has prevented from integration older legacy systems with newer systems  Internet software like searching engine needs to be improved  Accurate and accessible metadata across multiple organizations is hard to get

SIMCA 2009 Lecture 2


Four Levels of Analytical Processing
 In modern organization, at least four levels of analytical processing should be supported by information systems
 First level: Consists of simple queries and reports against current and historical data  Second level: Goes deeper and requires the ability to do “what if” processing across data store dimensions  Third level: Needs to step back and analyze what has previously occurred to bring about the current stat of the data  Fourth level: Analyzes what has happened in the past and what needs to be done in the future in order to bring some specific change

SIMCA 2009 Lecture 2


Data Warehouse Technology
 A strategy to build the basic constructs of the IDSS with today’s technologies  Definition given by W.H.Inmon
 The data warehouse is a collection of integrated, subject-oriented databases designed to support the DSS (decision support) function, where each unit of data is relevant to some moment in time

SIMCA 2009 Lecture 2


Data Warehouse Technology (Con’t)
 The data should be well-defined, consistent, and nonvolatile in nature.  The quantity of data should be large enough to support data analysis, querying, reporting, and comparisons of historical data over a longer period of time.  The data warehouse must be user driven.

SIMCA 2009 Lecture 2


Data Warehousing
 Subject Driven  Non Volatile  Time Varying  Integrated

SIMCA 2009 Lecture 2


Operational Data Store vs. Data Warehouse Technology
How built



Critical to Data access

Data volume

One application at a time in the One or more subject areas at a legacy environment or one subject time area at time in the ODS Daily business operation Management decisions that may Smaller numbers of rows retrieved affect profitability in a single call Large sets of data scanned to retrieve results Volume needed for daily operationLarger volume needed to support statistical analysis, forecasting, ad hoc reporting, and querying

SIMCA 2009 Lecture 2


Operational Data Store vs. Data Warehouse Technology
Data retention

Data retained to meet daily requirements

Data retained longer to support historical reporting, comparison, analysis, etc. Usually represents a static point in time; usually important that data does not change minute by minute Usually does not require as high availability as the production environment unless worldwide access is necessary

Data currency

Must be up to minute

Data Availability

High availability may be needed

SIMCA 2009 Lecture 2


Data Flow in a Single Organization

SIMCA 2009 Lecture 2


Data Mining: Introduction

SIMCA 2009 Lecture 2


Why Mine Data? Commercial Viewpoint
 Lots of data is being collected and warehoused
 Web data, e-commerce  purchases at department/ grocery stores  Bank/Credit Card transactions

 Computers have become cheaper and more powerful  Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g. in Customer Relationship Management)

SIMCA 2009 Lecture 2


Why Mine Data? Scientific Viewpoint
 Data collected and stored at enormous speeds (GB/hour)
 remote sensors on a satellite  telescopes scanning the skies  microarrays generating gene expression data  scientific simulations generating terabytes of data

 Traditional techniques infeasible  Data mining may help scientists
 in classifying and segmenting data  in Hypothesis Formation

for raw data

SIMCA 2009 Lecture 2


Mining Large Data Sets - Motivation
 There is often information “hidden” in the data that is not readily evident  Human analysts may take weeks to discover useful information  Much of the data is never analyzed at all
4,000,000 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000
From: R. Grossman, C. Kamath, V. Kumar, “Data Mining for Scientific and Engineering Applications”

The Data Gap
Total new disk (TB) since 1995

Number of analysts
SIMCA 2009 Lecture 2 191


What is Data Mining?
Many Definitions
 Non-trivial extraction of implicit, previously unknown and potentially useful information from data  Exploration & analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns

SIMCA 2009 Lecture 2


What is (not) Data Mining?
What is not Data Mining?
q q

What is Data Mining?

– Certain names are more prevalent in certain US locations (O’Brien, O’Rurke, O’Reilly… in Boston area) – Group together similar – Query a Web documents returned by search engine for search engine according to information about their context (e.g. Amazon “Amazon” rainforest,,) 193 SIMCA 2009 Lecture 2 – Look up phone number in phone directory

Origins of Data Mining
 Draws ideas from machine learning/AI, pattern recognition, statistics, and database systems  Traditional Techniques may be unsuitable due to
 Enormity of data  High dimensionality of data  Heterogeneous, distributed nature of data Statistics/ AI Machine Learning/ Pattern Recognition

Data Mining

Database systems

SIMCA 2009 Lecture 2


Data Mining Tasks
 Prediction Methods
 Use some variables to predict unknown or future values of other variables.

 Description Methods
 Find human-interpretable patterns that describe the data.

From [Fayyad,] Advances in Knowledge Discovery and Data Mining, 1996

SIMCA 2009 Lecture 2


Data Mining Tasks...
 Classification [Predictive]  Clustering [Descriptive]  Association Rule Discovery [Descriptive]  Sequential Pattern Discovery [Descriptive]  Regression [Predictive]  Deviation Detection [Predictive]

SIMCA 2009 Lecture 2


 Given a collection of records (training set )

Classification: Definition

 Each record contains a set of attributes, one of the attributes is the class.

 Find a model for class attribute as a function of the values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

SIMCA 2009 Lecture 2


Tid Refund 1 2 3 4 5 6 7 8 9 10

o eg t

l ca ri

Marital Status Single

s a ic ou r u go tin e ss at on la c c c
Taxable Income 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K Cheat No No No No Yes No

Classification Example l
Refund Marital Status No Yes No Yes No No

Taxable Income Cheat 75K 50K 150K ? ? ? ? ? ?

Yes No No Yes No No Yes No No No

Single Married Married

Married Single Married Divorced Married Divorced Single Married Single

Divorced 90K Single Married 40K 80K

No Yes No Yes

Test Set

Training Set
SIMCA 2009 Lecture 2

Learn Classifier


Classification: Application 1
 Direct Marketing
 Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell-phone product.  Approach:
 Use the data for a similar product introduced before.  We know which customers decided to buy and which decided otherwise. This {buy, don’t buy} decision forms the class attribute.  Collect various demographic, lifestyle, and company-interaction related information about all such customers.  Type of business, where they stay, how much they earn, etc.  Use this information as input attributes to learn a classifier model.

From [Berry & Linoff] Data Mining Techniques, 1997

SIMCA 2009 Lecture 2


Classification: Application 2
 Fraud Detection
 Goal: Predict fraudulent cases in credit card transactions.  Approach:
 Use credit card transactions and the information on its account-holder as attributes.  When does a customer buy, what does he buy, how often he pays on time, etc  Label past transactions as fraud or fair transactions. This forms the class attribute.  Learn a model for the class of the transactions.  Use this model to detect fraud by observing credit card transactions on an account.

SIMCA 2009 Lecture 2


Classification: Application 3
 Customer Attrition/Churn:
 Goal: To predict whether a customer is likely to be lost to a competitor.  Approach:
 Use detailed record of transactions with each of the past and present customers, to find attributes.  How often the customer calls, where he calls, what time-of-the day he calls most, his financial status, marital status, etc.  Label the customers as loyal or disloyal.  Find a model for loyalty.

SIMCA 2009 Lecture 2

From [Berry & Linoff] Data Mining Techniques, 1997


Classification: Application 4
 Sky Survey Cataloging
 Goal: To predict class (star or galaxy) of sky objects, especially visually faint ones, based on the telescopic survey images (from Palomar Observatory).
 3000 images with 23,040 x 23,040 pixels per image.

 Approach:
    Segment the image. Measure image attributes (features) - 40 of them per object. Model the class based on these features. Success Story: Could find 16 new high red-shift quasars, some of the farthest objects that are difficult to find!

SIMCA 2009 Lecture 2

From [Fayyad,] Advances in Knowledge Discovery and Data Mining, 1996


Classifying Galaxies
Early Class:
• Stages of Formation




• Image features, • Characteristics of light waves received, etc.


Data Size:

• 72 million stars, 20 million galaxies • Object Catalog: 9 GB • Image Database: 150 GB
SIMCA 2009 Lecture 2 203

Clustering Definition
 Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that
 Data points in one cluster are more similar to one another.  Data points in separate clusters are less similar to one another.

 Similarity Measures:
 Euclidean Distance if attributes are continuous.  Other Problem-specific Measures.

SIMCA 2009 Lecture 2


Illustrating Clustering ‚ Euclidean Distance Based Clustering in 3-D space.
Intracluster distances Intracluster distances are minimized are minimized Intercluster distances Intercluster distances are maximized are maximized

SIMCA 2009 Lecture 2


Clustering: Application 1
 Market Segmentation:
 Goal: subdivide a market into distinct subsets of customers where any subset may conceivably be selected as a market target to be reached with a distinct marketing mix.  Approach:
 Collect different attributes of customers based on their geographical and lifestyle related information.  Find clusters of similar customers.  Measure the clustering quality by observing buying patterns of customers in same cluster vs. those from different clusters.

SIMCA 2009 Lecture 2


Clustering: Application 2
 Document Clustering:
 Goal: To find groups of documents that are similar to each other based on the important terms appearing in them.  Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.  Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents.

SIMCA 2009 Lecture 2


Illustrating Document Clustering  Clustering Points: 3204 Articles of Los Angeles Times.
 Similarity Measure: How many words are common in these documents (after some word filtering).

Category Financial Foreign National Metro Sports

Total Articles 555 341 273 943 738

Correctly Placed 364 260 36 746 573 278

SIMCA 2009 Lecture 2 Entertainment 354

„ Observe Stock Movements every day. „ Clustering points: Stock-{UP/DOWN} „ Similarity Measure: Two points are more similar if the events described by them frequently happen together on the same day.
„ We used association rules to quantify a similarity measure. Discovered Clusters Industry Group

Clustering of S&P 500 Stock Data

1 2 3 4

Applied-Matl-DOW N,Bay-Net work-Down,3-COM-DOWN, Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN, DSC-Co mm-DOW N,INTEL-DOWN,LSI-Logic-DOWN, Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down, Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOW N, Sun-DOW N Apple-Co mp-DOW N,Autodesk-DOWN,DEC-DOWN, ADV-M icro-Device-DOWN,Andrew-Corp-DOWN, Co mputer-Assoc-DOWN,Circuit-City-DOWN, Co mpaq-DOWN, EM C-Corp-DOWN, Gen-Inst-DOWN, Motorola-DOW N,Microsoft-DOWN,Scientific-Atl-DOWN Fannie-Mae-DOWN,Fed-Ho me-Loan-DOW N, MBNA-Corp -DOWN,Morgan-Stanley-DOWN Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP, Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP, Schlu mberger-UP



Financial-DOWN Oil-UP

SIMCA 2009 Lecture 2


Association Rule Discovery: Definition
 Given a set of records each of which contain some number of items from a given collection;
 Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
TID Items

1 2 3 4 5

Bread, Coke, Milk Beer, Bread Beer, Coke, Diaper, Milk Beer, Bread, Diaper, Milk Coke, Diaper, Milk

Rules Discovered: Rules Discovered:

{Milk} --> {Coke} {Milk} --> {Coke} {Diaper, Milk} --> {Beer} {Diaper, Milk} --> {Beer}

SIMCA 2009 Lecture 2


Association Rule Discovery: Application 1
 Marketing and Sales Promotion:
 Let the rule discovered be {Bagels, … } --> {Potato Chips}  Potato Chips as consequent => Can be used to determine what should be done to boost its sales.  Bagels in the antecedent => Can be used to see which products would be affected if the store discontinues selling bagels.  Bagels in antecedent and Potato chips in consequent => Can be used to see what products should be sold with Bagels to promote sale of Potato chips!

SIMCA 2009 Lecture 2


Association Rule Discovery: Application 2
 Supermarket shelf management.
 Goal: To identify items that are bought together by sufficiently many customers.  Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items.  A classic rule - If a customer buys diaper and milk, then he is very likely to buy beer.  So, don’t be surprised if you find six-packs stacked next to diapers!

SIMCA 2009 Lecture 2


Association Rule Discovery: Application 3
 Inventory Management:
 Goal: A consumer appliance repair company wants to anticipate the nature of repairs on its consumer products and keep the service vehicles equipped with right parts to reduce on number of visits to consumer households.  Approach: Process the data on tools and parts required in previous repairs at different consumer locations and discover the co-occurrence patterns.

SIMCA 2009 Lecture 2


Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events.

Sequential Pattern Discovery: Definition
(A B) (C) (D E)

Rules are formed by first discovering patterns. Event occurrences in the patterns are governed by timing constraints.

(A B)
<= xg

(C) (D E)
>ng <= ms <= ws

SIMCA 2009 Lecture 2


Sequential Pattern Discovery: Examples
 In telecommunications alarm logs,
 (Inverter_Problem Excessive_Line_Current) (Rectifier_Alarm) --> (Fire_Alarm)

 In point-of-sale transaction sequences,
 Computer Bookstore: (Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk)  Athletic Apparel Store: (Shoes) (Racket, Racketball) --> (Sports_Jacket)

SIMCA 2009 Lecture 2


 Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency.  Greatly studied in statistics, neural network fields.  Examples:
 Predicting sales amounts of new product based on advertising expenditure.  Predicting wind velocities as a function of temperature, humidity, air pressure, etc.  Time series prediction of stock market indices.

SIMCA 2009 Lecture 2


Deviation/Anomaly Detection
 Detect significant deviations from normal behavior  Applications:
 Credit Card Fraud Detection

 Network Intrusion Detection

SIMCA 2009 Lecture 2


Typical network traffic at University level may reach over 100 million connections per

Challenges of Data Mining
 Scalability  Dimensionality  Complex and Heterogeneous Data  Data Quality  Data Ownership and Distribution  Privacy Preservation  Streaming Data

SIMCA 2009 Lecture 2


Exam Review
 Date: __.10.2009 - Time: 3 hours - Marks: 70  All Questions are Compulsory  A) Attempt any 7 Questions. Each Question carries 2 marks. (14 Marks)  B) Attempt any 6 Questions. Each Question carries 4 marks. (24 Marks)  C) Write Short Notes on any 4 topics mentioned below. Each Question carries 8 marks. (32 Marks)

SIMCA 2009 Lecture 2


           IT Subsystems Data Processing MIS Classical Management / Strategic Management DBMS Networks Data Mining / Data warehousing Applications Outsourcing Business trends / Models /

SIMCA 2009 Lecture 2


SIMCA 2009 Lecture 2


Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.