You are on page 1of 6

INTRODUCTION TO DATA MINING

Q) What is Data Mining?

A. Data mining is a process that uses statistical, mathematical, and artificial intelligence techniques to
extract and identify useful information and subsequent knowledge from large sets of data.

The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable
patterns in data stored in structured databases.

Q) How Data Mining Works?

A. Data mining models try to discover patterns among attributes presented in the data set. (relevant
data from inside and outside organization)

Models are mathematical representations that identify patterns among attribute of things such as
customers, events.

Two types of patterns:-

--Explanatory: explaining relationships and affinities among the attributes.

-- Predictive: Forecasting future values of certain attributes.

Four Major types of patterns:-

-Associations

-Predictions

-Clusters

-Sequential Relationships

Q) Applications of Data Mining?

A. • Customer Relationship Management

– Maximize return on marketing campaigns

– Improve customer retention (churn analysis)

– Maximize customer value (cross-, up-selling)

– Identify and treat most valued customers

• Banking & Other Financial


– Automate the loan application process

– Detecting fraudulent transactions

– Maximize customer value (cross-, up-selling)

– Optimizing cash reserves with forecasting

• Retailing and Logistics


– Optimize inventory levels at different locations
– Improve the store layout and sales promotions
– Optimize logistics by predicting seasonal effects
– Minimize losses due to limited shelf life
• Manufacturing and Maintenance
– Predict/prevent machinery failures
– Identify anomalies in production systems to optimize
the use manufacturing capacity
– Discover novel patterns to improve product quality
• Brokerage and Securities Trading
– Predict changes on certain bond prices
– Forecast the direction of stock fluctuations
– Assess the effect of events on market movements
– Identify and prevent fraudulent activities in trading
• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities.
Q) Data Mining Process?
A. Data Mining Process
• A manifestation of the best practices
• A systematic way to conduct Data Mining projects
• Moving from Art to Science for Data Mining project
• Most common standard processes:
– CRISP-DM (Cross-Industry Standard Process for Data Mining)
Step 1: Business Understanding
• To understand what the business wants to solve
• Determine the business question and objective
– What to solve from the business perspective, what the
customer wants, and define the business success criteria
• Determine the project goals
– What are the common characteristics of customers we lost
to our competitors recently?
– What are typical profiles of our customers, and how much
value does each of them provide to us?
• Project plan
– Try to create a detailed plan for each project phase and
what kind of tools you would use
– Budget to support the study
Step 2: Data Understanding
• Identify relevant data based on the business task to be
addressed
– Should be clear and concise about the description of the
data mining task
– To better understand the data use variety of statistical and
graphical tools
– Data sources for data selection can vary
▪ Demographic, sociographic, transactional, social media
– May include quantitative and qualitative data.
Step 3: Data Preparation
• Referred as data pre-processing
• Prepare data for analysis
• This step usually consumes 80% of the project time
• Real-world data is
– Incomplete: lacking attribute values, attributes of interest,
containing only aggregated data
– Noisy: containing errors or outliers
– Inconsistent: discrepancies in values, codes and names
Step 4: Model Building
• There is no universally known best method or algorithm for a
data mining task
• Model building includes assessment and comparative analysis
of various models
• For a single method, a number of parameters need to be
calibrated to obtain optimum results
• Identify the best method for a given purpose.
Step 5: Testing and Evaluation
• This is a critical and challenging task
• Developed models are assessed and evaluated for their
accuracy and generality
– Assess degree to which selected model meets the
business objectives
• Test developed models in real-world scenario if time and
budget constraints permit
• No value is added by data mining task until business value is
obtained from discovered knowledge pattern is identified and
recognized
– Depends on interaction of data analysts, business
analysts, and decision makers.
Step 6: Deployment
• Deployment can be as simple as generating a report or as
complex as implementing a repeatable data mining process
across the enterprise
• Deployment is often done by the customer, not data analyst
• It also includes the maintenance activities for the deployed
model
– Over time, models built on old data may become obsolete,
irrelevant, or misleading
• To monitor the deployment of the data mining results, the
project needs a detailed plan on the monitoring process, which
may not be a trivial task for complex data mining models

INTRODUCTION TO DATA WAREHOUSE

Q) What is Data Warehousing?


A. • DW is a multitude of organizational and external data is
captured, transformed, and stored in a data warehouse to
support timely and accurate decisions through enriched
business insight.
• Repository of current and historical data of potential interest to
managers throughout the organization
• Data are usually structured to be available in a form ready for
analytical processing activities (OLAP)
– Mining, querying, reporting etc
• DW is a subject-oriented, integrated, time-variant, non-volatile
collection of data in support management decision-making
process
• Bill Inmon (1993) wrote seminal book – Building the Data
Warehouse and is considered father of data warehousing.

Q) Characteristics of Data warehouses?


A. • Subject oriented: such as sales, products or customers
• Integrated: data from different sources into a consistent format
• Time-variant (time series): detect trends, deviations, relationships
• Nonvolatile: users cannot change or update data. Only discard
• Web based: optimized for web-based applications
• Relational/multi-dimensional: uses either of the structures
• Client/server: architecture to provide easy access to end users
• Real-time: Newer DWs provide real-time, active data access
• Include Metadata: data about how data is organized.

Q) Types of Data Warehousing?


A. • Three types of data warehouses
– Data Marts (DMs)
– Operational Data Stores (ODS)
– Enterprise Data Warehouses (EDW)
Data Marts
• Data Mart is usually smaller and focuses on a particular subject or
department. Subset of a data DW – single subject area
• A departmental small-scale “DW” that stores only limited/relevant
data
• Dependent data mart
– A subset that is created directly from a data warehouse
– Ensure end user is viewing same version data that of DW users
• Independent data mart
– A small data warehouse designed for a strategic business unit
or a department
– Used by small companies as a low-cost, scaled-down version
of DW
– Its source is not enterprise DW
Operational Data Stores (ODS)
• Used as an interim staging area for a data warehouse
• Contents of ODS are updated throughout the course of business
operations
• Used for short-term decisions involving mission-critical
applications rather than for the medium- and long-terms
decisions associated with an EDW
• ODS is similar to short-terms memory – it stores only very
recent information
• ODS consolidates data from multiple source systems and
provides a near real-time, integrated view of volatile, current
data.
Enterprise Data Warehouses (EDW)
• EDW is a large-scale data warehouse that is used across the
enterprise for decision support
• Provides integration of data from many sources into a standard
format for effective BI and decision support applications
• EDWs are used to provide data for many types of decision
support systems (DSS)
– Customer Relationship Management (CRM)
– Supply Chain Management (SCM)
– Business Performance Management (BPM)
– Business Activity monitoring
– Product life-cycle management
– Revenue management
– Knowledge management

Q) Data Warehouse Process?


A. Data Warehousing Process
• Organizations continuously collect data, information and
knowledge at an increasingly accelerated rate and store them
• Due to scalability issues maintaining, using data and
information becomes extremely complex
• Due to improved reliability and availability of network access,
internet, users accessing information continues to increase
• Working with multiple databased has become an extremely
difficult task requiring considerable expertise
• The benefits of DW far exceed its costs.

You might also like