You are on page 1of 60

Data Visualization using Business

Intelligence(MDS204)
Arti yadav
Einfach Bussiness Analytics pvt ltd.

1
Introduction

 The distinction between data mining and knowledge discovery is largely one
of timing.
 Data mining is the process by which substantial amounts of data are
organized, normalized, tabulated, and categorized; in short, it is analyzing
large databases in order to generate additional information.
 Knowledge discovery, however, can be associated with specific context (e.g.,
can be guided by the vernacular of a particular specialty, organization, or
practice), making it both quantitative and qualitative. Knowledge can—and
should—be viewed as having a personality.

2
Knowledge discovery

 Knowledge discovery is the process of extracting useful knowledge


from data
 Knowledge discovery is a process that requires a lot of data, and that data
needs to be in a reliable state before it can be subjected to the data
mining process. The accumulation of enterprise data within a data
warehouse that has been properly validated, cleaned, and integrated provides
the best source of data that can be subjected to knowledge discovery.

3
Knowledge discovery
Some people don’t differentiate data mining from knowledge discovery while
others view data mining as an essential step in the process of knowledge discovery.
Here is the list of steps involved in the knowledge discovery process −
 Data Cleaning − In this step, the noise and inconsistent data is removed.
 Data Integration − In this step, multiple data sources are combined.
 Data Selection − In this step, data relevant to the analysis task are retrieved
from the database.
 Data Transformation − In this step, data is transformed or consolidated into
forms appropriate for mining by performing summary or aggregation
operations.

4
Cont.

 Data Mining − In this step, intelligent methods are applied in order to extract
data patterns.
 Pattern Evaluation − In this step, data patterns are evaluated.
 Knowledge Presentation − In this step, knowledge is represented.

 It does this by using data mining methods (algorithms) to extract (identify)


what is deemed knowledge, according to the specifications of measures and
thresholds, using a database along with any required preprocessing,
subsampling, and transformations of that database.

5
The following diagram shows the process
of knowledge discovery −

6
An Outline of the Steps of the KDD Process

7
Cont.

8
Cont.

9
Knowledge Discovery Process (KDP)

Knowledge Discovery Process may consist of the following steps :-


1 Data cleaning -
 First step in the Knowledge Discovery Process is Data cleaning in which noise
and inconsistent data is removed.
2 Data Integration -
 Second step is Data Integration in which multiple data sources are combined.
3 Data Selection -
 Next step is Data Selection in which data relevant to the analysis task are
retrieved from the database.

10
Cont.
4 Data Transformation -
 In Data Transformation, data are transformed into forms appropriate for mining
by performing summary or aggregation operations.
5 Data Mining -
 In Data Mining, data mining methods (algorithms) are applied in order to
extract data patterns.
6 Pattern Evaluation -
 In Pattern Evaluation, data patterns are identified based on some interesting
measures.
7 Knowledge Presentation -
 In Knowledge Presentation, knowledge is represented to user using many
knowledge representation techniques.

11
KDD process
Steps Involved in KDD Process:
 Data Cleaning: Data cleaning is defined as removal of noisy and irrelevant
data from collection.
 Cleaning in case of Missing values.
 Cleaning noisy data, where noise is a random or variance error.
 Cleaning with Data discrepancy detection and Data transformation tools.
 Data Integration: Data integration is defined as heterogeneous data from
multiple sources combined in a common source(DataWarehouse).
 Data integration using Data Migration tools.
 Data integration using Data Synchronization tools.
 Data integration using ETL(Extract-Load-Transformation) process.

12
Cont.
 Data Selection: Data selection is defined as the process where data relevant
to the analysis is decided and retrieved from the data collection.
 Data selection using Neural network.
 Data selection using Decision Trees.
 Data selection using Naive bayes.
 Data selection using Clustering, Regression, etc.
 Data Transformation: Data Transformation is defined as the process of
transforming data into appropriate form required by mining procedure.
 Data Transformation is a two step process:
 Data Mapping: Assigning elements from source base to destination to capture
transformations.
 Code generation: Creation of the actual transformation program.

13
Cont.
 Data Mining: Data mining is defined as clever techniques that are applied to
extract patterns potentially useful.
 Transforms task relevant data into patterns.
 Decides purpose of model using classification or characterization.
 Pattern Evaluation: Pattern Evaluation is defined as as identifying strictly
increasing patterns representing knowledge based on given measures.
 Find interestingness score of each pattern.
 Uses summarization and Visualization to make data understandable by user.
 Knowledge representation: Knowledge representation is defined as
technique which utilizes visualization tools to represent data mining results.
 Generate reports.
 Generate tables.
 Generate discriminant rules, classification rules, characterization rules, etc.
14
Why we need Data Mining?

 Why we need Data Mining?


Volume of information is increasing everyday that we can handle from
business transactions, scientific data, sensor data, Pictures, videos, etc. So,
we need a system that will be capable of extracting essence of information
available and that can automatically generate report,
views or summary of data for better decision-making.
 Why Data Mining is used in Business?
Data mining is used in business to make better managerial decisions by:
 Automatic summarization of data
 Extracting essence of information stored.
 Discovering patterns in raw data.

15
Why we need Data Mining?

 Establish relevance and relationships amongst data. Use this information to


generate profitable insights
 Business can make informed decisions quickly
 Helps to find out unusual shopping patterns in grocery stores.
 Optimize website business by providing customize offers to each visitor.
 Helps to measure customer's response rates in business marketing.
 Creating and maintaining new customer groups for marketing purposes.
 Predict customer defections, like which customers are more likely to switch
to another supplier in the nearest future.
 Differentiate between profitable and unprofitable customers.
 Identify all kind of suspicious behavior, as part of a fraud detection process.

16
Data mining

 Data mining refers to extraction of information from large amount of data.


 In today’s world data mining is very important because huge amount of data is
present in companies and different type of organization. It becomes
impossible for humans to extract information from this large data, so machine
learning technology are used in order to process data fast enough to extract
information from it.
 Data mining is used by companies in order to get customer preferences,
determine price of their product and services and to analyse market.

17
Data mining

 Data mining refers to extracting knowledge from large


amount of data
 Data mining, or knowledge discovery, is a process of
discovering patterns that lead to actionable
knowledge from large data sets through one or more
traditional data mining techniques, such as market
basket analysis and clustering. A lot of the knowledge
discovery methodology has evolved from the
combination of the worlds of statistics and computer
science.
18
Data Mining Architecture

 Data mining architecture has many elements like Data Warehouse, Data
Mining Engine, Pattern evaluation,User Interface and Knowledge Base.
Data Warehouse:
 A data warehouse is a place which store information collected from multiple
sources under unified schema. Information stored in a data warehouse is
critical to organizations for the process of decision-making.
Data Mining Engine:
 Data Mining Engine is the core component of data mining process which
consists of various modules that are used to perform various tasks like
clustering, classification, prediction and correlation analysis.

19
Cont.

Pattern Evaluation:
 Pattern Evaluation is responsible for finding various patterns with the help of
Data Mining Engine.
User Interface:
 User Interface provides communication between user and data mining system.
It allows user to use the system easily even if user doesn't have proper
knowledge of the system.
Knowledge Base:
 Knowledge Base consists of data that is very important in the process of data
mining.Knowledge Base provides input to the data mining engine which guides
data mining engine in the process of pattern search.

20
Data Mining Architecture

21
Data Mining Techniques

 Extracting important knowledge from a very large amount of data can be


crucial to organizations for the process of decision-making.
 Some data mining techniques are :-
 1 Association
 2 Classification
 3 Clustering
 4 Sequential patterns
 5 Decision tree.

22
1. Association Technique

 Association Technique helps to find out the pattern from huge data, based on
a relationship between two or more items of the same transaction. The
association technique is used to analyze market means it help us to analyze
people's buying habits.
 For example, you might identify that a customer always buys ice cream
whenever he comes to watch move so it might be possible that when
customer again comes to watch movie he might also want to buy ice cream
again.

23
2. Classification Technique

 Classification technique is most common data mining technique. In


classification method we use mathematical techniques such as decision trees,
neural network and statistics in order to predict unknown records. This
technique helps in deriving important information about data.
 Let assume you have set of records, each record contains a set of attributes
and depending upon this attributes you will be able to predict unseen or
unknown records. For example, you have given all records of employees who
left the company, with classification technique you can predict who will
probably leave the company in a future period.

24
3. Clustering Technique

 Clustering is one of the oldest techniques used in the process of data mining.
The main aim of clustering technique is to makes cluster(groups) from pieces
of data which share common characteristics. Clustering Technique help to
identify the differences and similarities between the data.
 Take an example of a shop in which many items are for sales, now the
challenge is how to keep those items in such way that customer can easily
find his required item.By using the clustering technique, you can keep some
items in one corner that have some similarities and other items in another
corner that have some different similarities.

25
4. Sequential patterns

 Sequential patterns are a useful method for identifying trends and similar
patterns.
 For example, in customer data you identify that a customer buys particular
product on particular time of year, you can use this information to suggest
customer these particular product on that time of year.

26
5. Decision tree

 Decision tree is one of the most common used data mining techniques
because its model is easy to understand for users. In decision tree you start
with a simple question which has two or more answers.Each answer leads to a
further two or more question which help us to make a final decision. The root
node of decision tree is a simple question.
 Take a example of flood warning system.

27
Decision tree

First check water level, if water level is > 50ft then


alert is send and if water level is < 50ft then check
water level if water level is > 30ft then send warning
and if water level is < 30ft then water is in normal
range.

28
Few more techniques

 Estimation, which is a process of assigning some continuously valued numeric


value to an object. For example, credit risk assessment is not necessarily a
yes/no question; it could be some kind of scoring that assesses a propensity to
default on a loan. Estimation can be used as part of the classification process
(such as using an estimation model to guess a person's annual salary as part of
a market segmentation process).
 Prediction, which is an attempt to classify objects according to some
expected future behavior. Classification and estimation can be used for
prediction by applying historical data where the classification is already
known to build a model (this is called training). That model can then be
applied to new data to predict future behavior.

29
Cont.

 Affinity grouping, which is a process of evaluating relationships or


associations between data elements that demonstrate some kind of affinity
between objects.
 Description, which is the process of trying to describe what has been
discovered, or trying to explain the results of the data mining process.
 There are a number of techniques that are used to perform these tasks:
market basket analysis, memory-based reasoning, cluster detection, link
analysis, rule induction, neural networks, and so on.

30
Data Mining Applications

 Data mining refers to extraction of information from large amount of data.


Extracting important knowledge from a very large amount of data can be
crucial to organizations for the process of decision-making.
 Data Mining Applications are:

31
1. Data mining applications in Marketing:

 Data mining process extract information from various data source which is
very useful in the process of planning, organizing, managing and launching
new product in a cost effective way. Data mining technique help us to
understand the purchase behavior of a buyer like how frequently customer
purchase a item, total value of all purchases and when was the last purchase.
With data mining you can understand the needs of buyer’s and make product
and services according to buyer’s requirement.
 Data base marketing is one of the most popular application of data mining.

32
2. Data mining applications in HealthCare:

 Data mining can be very useful to improve healthcare system. With data
mining you can predict number of patients which help you to make sure that
every patient receive proper care at right time and at right place.
 Data mining can help all parties involved in the healthcare industry. For
example, data mining can help healthcare insurers detect fraud and abuse,
healthcare organizations can improve there decision making by using
knowledge provided by data mining, patients can receive better and more
affordable healthcare services.

33
3. Data mining applications in Education:

 Educational data mining (EDM) is a new emerging field which is used to


address students challenges and help us to understand how students learn by
creating student models. The main goal of educational data mining is to
predict students future learning behavior so that necessary steps can taken
before a student falls or drops out. Data mining is also used to predict the
results of the student.

34
4. Data mining applications in Retail
Industry:

 Retail industry collects large amount of data on sales and customer shopping
history. Retail data mining helps in analyzing client behavior, customer buying
patterns and trends and lead to better customer service, good customer
satisfaction and minimize the cost of business.

35
5. Data mining applications in Banking:

 The banking industry has hugely benefited from the advancements in digital
technology. Data mining is becoming strategically important area for many
business organizations including banking sector.
 Data mining is used in financial and banking sector for credit analysis,
fraudulent transactions, cash management and to predicting payment.

36
Advantage of Data Mining:

 Data mining technique helps companies to get knowledge-based information.


 Data mining helps organizations to make the profitable adjustments in
operation and production.
 The data mining is a cost-effective and efficient solution compared to other
statistical data applications.
 Data mining helps with the decision-making process.
 Facilitates automated prediction of trends and behaviors as well as
automated discovery of hidden patterns.
 It can be implemented in new systems as well as existing platforms
 It is the speedy process which makes it easy for the users to analyze huge
amount of data in less time.

37
Disadvantages of Data Mining

 There are chances of companies may sell useful information of their


customers to other companies for money. For example, American Express has
sold credit card purchases of their customers to the other companies.
 Many data mining analytics software is difficult to operate and requires
advance training to work on.
 Different data mining tools work in different manners due to different
algorithms employed in their design. Therefore, the selection of correct data
mining tool is a very difficult task.
 The data mining techniques are not accurate, and so it can cause serious
consequences in certain conditions.

38
Data warehousing
 Data warehouse is a subject oriented integrated non-volatile time variant
collection of data in support of management’s decisions.
 A Data Warehousing (DW) is process for collecting and managing data from
varied sources to provide meaningful business insights. A Data warehouse is
typically used to connect and analyze business data from heterogeneous
sources. The data warehouse is the core of the BI system which is built for
data analysis and reporting.
 It is a blend of technologies and components which aids the strategic use of
data. It is electronic storage of a large amount of information by a business
which is designed for query and analysis instead of transaction processing. It is
a process of transforming data into information and making it available to
users in a timely manner to make a difference.

39
Cont.

 Data warehousing is the electronic storage of a large amount of information


by a business or organization.
 A data warehouse is designed to run query and analysis on historical data
derived from transactional sources for business intelligence and data mining
purposes.
 Data warehousing is used to provide greater insight into the performance of a
company by comparing data consolidated from multiple heterogeneous
sources.

40
Data Warehouse Features

The key features of a data warehouse are discussed below −


 Subject Oriented − A data warehouse is subject oriented because it provides
information around a subject rather than the organization's ongoing operations.
These subjects can be product, customers, suppliers, sales, revenue, etc. A data
warehouse does not focus on the ongoing operations, rather it focuses on
modelling and analysis of data for decision making.
 Integrated − A data warehouse is constructed by integrating data from
heterogeneous sources such as relational databases, flat files, etc. This
integration enhances the effective analysis of data.
 Time Variant − The data collected in a data warehouse is identified with a
particular time period. The data in a data warehouse provides information from
the historical point of view.
 Non-volatile − Non-volatile means the previous data is not erased when new
data is added to it. A data warehouse is kept separate from the operational
database and therefore frequent changes in operational database is not
reflected in the data warehouse. 41
How Data warehouse works?
 A Data Warehouse works as a central repository where information arrives from
one or more data sources. Data flows into a data warehouse from the
transactional system and other relational databases.
 Data may be:
Structured
Semi-structured
Unstructured data
 The data is processed, transformed, and ingested so that users can access the
processed data in the Data Warehouse through Business Intelligence tools, SQL
clients, and spreadsheets. A data warehouse merges information coming from
different sources into one comprehensive database.
 By merging all of this information in one place, an organization can analyze its
customers more holistically. This helps to ensure that it has considered all the
information available. Data warehousing makes data mining possible. Data
mining is looking for patterns in the data that may lead to higher sales and
profits. 42
Types of Data Warehouse

Information processing, analytical processing, and data mining are the three
types of data warehouse applications that are discussed below −
 Information Processing − A data warehouse allows to process the data stored
in it. The data can be processed by means of querying, basic statistical
analysis, reporting using crosstabs, tables, charts, or graphs.
 Analytical Processing − A data warehouse supports analytical processing of
the information stored in it. The data can be analyzed by means of basic OLAP
operations, including slice-and-dice, drill down, drill up, and pivoting.
 Data Mining − Data mining supports knowledge discovery by finding hidden
patterns and associations, constructing analytical models, performing
classification and prediction. These mining results can be presented using the
visualization tools.

43
Types of Data Warehouse Models

From the perspective of data warehouse architecture, three main types of Data
Warehouses are:
1. Enterprise Data Warehouse:
 Enterprise Data Warehouse is a centralized warehouse. It provides decision support
service across the enterprise. It offers a unified approach for organizing and
representing data. It also provide the ability to classify data according to the subject
and give access according to those divisions.
2. Operational Data Store:
 Operational Data Store, which is also called ODS, are nothing but data store required
when neither Data warehouse nor OLTP systems support organizations reporting
needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely
preferred for routine activities like storing records of the Employees.
3. Data Mart:
 A data mart is a subset of the data warehouse. It specially designed for a particular
line of business, such as sales, finance, sales or finance. In an independent data
mart, data can collect directly from sources. 44
General stages of Data Warehouse
 Earlier, organizations started relatively simple use of data warehousing.
However, over time, more sophisticated use of data warehousing begun.
The following are general stages of use of the data warehouse:
Offline Operational Database:
 In this stage, data is just copied from an operational system to another
server. In this way, loading, processing, and reporting of the copied data do
not impact the operational system's performance.
Offline Data Warehouse:
 Data in the Datawarehouse is regularly updated from the Operational
Database. The data in Datawarehouse is mapped and transformed to meet the
Datawarehouse objectives.

45
Cont.

Real time Data Warehouse:


 In this stage, Data warehouses are updated whenever any transaction takes
place in operational database. For example, Airline or railway booking
system.
Integrated Data Warehouse:
 In this stage, Data Warehouses are updated continuously when the operational
system performs a transaction. The Data warehouse then generates
transactions which are passed back to the operational system.

46
Components of Data warehouse

Four components of Data Warehouses are:


 Load manager: Load manager is also called the front component. It performs
with all the operations associated with the extraction and load of data into
the warehouse. These operations include transformations to prepare the data
for entering into the Data warehouse.
 Warehouse Manager: Warehouse manager performs operations associated
with the management of the data in the warehouse. It performs operations
like analysis of data to ensure consistency, creation of indexes and views,
generation of de-normalization and aggregations, transformation and merging
of source data and archiving and baking-up data.

47
Cont.

 Query Manager: Query manager is also known as backend component. It


performs all the operation operations related to the management of user
queries. The operations of this Data warehouse components are direct queries
to the appropriate tables for scheduling the execution of queries.
 End-user access tools: This is categorized into five different groups like 1.
Data Reporting 2. Query Tools 3. Application development tools 4. EIS tools, 5.
OLAP tools and data mining tools.

48
Data warehouse Architecture

49
Steps to Implement Data Warehouse

The best way to address the business risk associated with a Data warehouse
implementation is to employ a three-prong strategy as below
 Enterprise strategy: Here we identify technical including current
architecture and tools. We also identify facts, dimensions, and attributes.
Data mapping and transformation is also passed.
 Phased delivery: Data warehouse implementation should be phased based on
subject areas. Related business entities like booking and billing should be first
implemented and then integrated with each other.
 Iterative Prototyping: Rather than a big bang approach to implementation,
the Data warehouse should be developed and tested iteratively.

50
Best practices to implement a Data
Warehouse
 Decide a plan to test the consistency, accuracy, and integrity of the data.
 The data warehouse must be well integrated, well defined and time stamped.
 While designing Data warehouse make sure you use right tool, stick to life
cycle, take care about data conflicts and ready to learn you're your mistakes.
 Never replace operational systems and reports
 Don't spend too much time on extracting, cleaning and loading data.
 Ensure to involve all stakeholders including business personnel in Data
warehouse implementation process. Establish that Data warehousing is a joint/
team project. You don't want to create Data warehouse that is not useful to the
end users.
 Prepare a training plan for the end users.

51
Data warehouse users
 Data warehouse is needed for all types of users like:
 Decision makers who rely on mass amount of data
 Users who use customized, complex processes to obtain information from
multiple data sources.
 It is also used by the people who want simple technology to access the data
 It also essential for those people who want a systematic approach for making
decisions.
 If the user wants fast performance on a huge amount of data which is a
necessity for reports, grids or charts, then Data warehouse proves useful.
 Data warehouse is a first step If you want to discover 'hidden patterns' of
data-flows and groupings.

52
Data Warehouse Application

Here, are most common sectors where Data warehouse is used:


Airline:
 In the Airline system, it is used for operation purpose like crew assignment,
analyses of route profitability, frequent flyer program promotions, etc.
Banking:
 It is widely used in the banking sector to manage the resources available on
desk effectively. Few banks also used for the market research, performance
analysis of the product and operations.
Healthcare:
 Healthcare sector also used Data warehouse to strategize and predict
outcomes, generate patient's treatment reports, share data with tie-in
insurance companies, medical aid services, etc.
53
Cont.

Public sector:
 In the public sector, data warehouse is used for intelligence gathering. It helps
government agencies to maintain and analyze tax records, health policy
records, for every individual.
Investment and Insurance sector:
 In this sector, the warehouses are primarily used to analyze data patterns,
customer trends, and to track market movements.
Retain chain:
 In retail chains, Data warehouse is widely used for distribution and marketing. It
also helps to track items, customer buying pattern, promotions and also used for
determining pricing policy.

54
Cont.

Telecommunication:
 A data warehouse is used in this sector for product promotions, sales
decisions and to make distribution decisions.
Hospitality Industry:
 This Industry utilizes warehouse services to design as well as estimate their
advertising and promotion campaigns where they want to target clients
based on their feedback and travel patterns.

55
Advantages of Data Warehouse:

 Data warehouse allows business users to quickly access critical data from
some sources all in one place.
 Data warehouse provides consistent information on various cross-functional
activities. It is also supporting ad-hoc reporting and query.
 Data Warehouse helps to integrate many sources of data to reduce stress on
the production system.
 Data warehouse helps to reduce total turnaround time for analysis and
reporting.
 Restructuring and Integration make it easier for the user to use for reporting
and analysis.
 Data warehouse allows users to access critical data from the number of
sources in a single place. Therefore, it saves user's time of retrieving data
from multiple sources.
 Data warehouse stores a large amount of historical data. This helps users to
analyze different time periods and trends to make future predictions. 56
Disadvantages of Data Warehouse:
 Not an ideal option for unstructured data.
 Creation and Implementation of Data Warehouse is surely time confusing affair.
 Data Warehouse can be outdated relatively quickly
 Difficult to make changes in data types and ranges, data source schema,
indexes, and queries.
 The data warehouse may seem easy, but actually, it is too complex for the
average users.
 Despite best efforts at project management, data warehousing project scope
will always increase.
 Sometime warehouse users will develop different business rules.
 Organisations need to spend lots of their resources for training and
Implementation purpose.
57
The Future of Data Warehousing

 Change in Regulatory constrains may limit the ability to combine source of


disparate data. These disparate sources may include unstructured data which
is difficult to store.
 As the size of the databases grows, the estimates of what constitutes a very
large database continue to grow. It is complex to build and run data
warehouse systems which are always increasing in size. The hardware and
software resources are available today do not allow to keep a large amount of
data online.
 Multimedia data cannot be easily manipulated as text data, whereas textual
information can be retrieved by the relational software available today. This
could be a research subject.

58
Data Warehouse Tools

There are many Data Warehousing tools are available in the market. Here, are
some most prominent one:
1. Mark Logic:
 Mark Logic is useful data warehousing solution that makes data integration
easier and faster using an array of enterprise features. This tool helps to
perform very complex search operations. It can query different types of data
like documents, relationships, and metadata.
2. Oracle:
 Oracle is the industry-leading database. It offers a wide range of choice of data
warehouse solutions for both on-premises and in the cloud. It helps to optimize
customer experiences by increasing operational efficiency.
3. Amazon Redshift:
 Amazon Redshift is Data warehouse tool. It is a simple and cost-effective tool to
analyze all types of data using standard SQL and existing BI tools. It also allows
running complex queries against petabytes of structured data, using the
technique of query optimization.
59
End of slides
60

You might also like