Professional Documents
Culture Documents
MBA Sem 2
Data Mining
Dr.Ashvini Shende
11
Ch 1 Content
Concept, Definitions and Need of Big Data,
Data Mining,
Business Intelligence.
Data Mining Process,
relation to Business Intelligence techniques.
Introduction to Data Mining Tasks (Classification,
Clustering, Association Analysis, Anomaly
Detection).
Concept, Definitions of model, descriptive models,
predictive modeling, basic terminology.
Real-world data mining applications - Big Data
Analytics in Mobile Environments, Fraud Detection
and Prevention with Data Mining Techniques, Big
Data Analytics in Business Environments 2
Why Data Mining?
The Explosive Growth of Data: from terabytes to petabytes
Data collection and data availability
Automated data collection tools, database systems,
Web, computerized society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks, …
Science: Remote sensing, bioinformatics, scientific simulation, …
Society and everyone: news, digital cameras, YouTube
We are drowning in data, but starving for knowledge!
“Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets
3
Evolution of Database Technology
1960s:
Data collection, database creation, IMS and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
Application-oriented DBMS (spatial, scientific, engineering, etc.)
1990s:
Data mining, data warehousing, multimedia databases, and Web
databases
2000s
Stream data management and mining
Data mining and its applications
Web technology (XML, data integration) and global information systems
4
What Is Data Mining?
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
6
Databases
7
Example: A Web Mining Framework
8
Data Mining in Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
10
KDD Process: A Typical View from ML and
Statistics
11
Example: Medical Data Mining
Health care & medical data mining – often
adopted such a view in statistics and machine
learning
Preprocessing of the data (including feature
extraction and dimension reduction)
Classification or/and clustering processes
Post-processing for presentation
12
Need of Big Data
• Bigdata is a term used to describe a collection of data that is
huge in size and yet growing exponentially with time.
• Big Data analytics examples includes stock exchanges,
social media sites, jet engines, etc.
• Big Data could be 1) Structured, 2) Unstructured, 3) Semi-
structured.
• Volume, Variety, Velocity, and Variability are few Big
Data ...
• Big Data is a collection of data that is huge in volume, yet
growing exponentially with time. It is a data with so large size
and complexity that none of traditional data management tools
can store it or process it efficiently. Big data is also a data but
with huge size.
August 18, 1
Business Intelligence- Process
These processes include:
Data mining: Using databases, statistics and machine learning to uncover
trends in large datasets.
Reporting: Sharing data analysis to stakeholders so they can draw
conclusions and make decisions.
Performance metrics and benchmarking: Comparing current performance data
to historical data to track performance against goals, typically using
customized dashboards.
Descriptive analytics: Using preliminary data analysis to find out what
happened.
Querying: Asking the data specific questions, BI pulling the answers from
the datasets.
Statistical analysis: Taking the results from descriptive analytics and further
exploring the data using statistics such as how this trend happened and
why.
Data visualization: Turning data analysis into visual representations such as
charts, graphs, and histograms to more easily consume data.
Visual analysis: Exploring data through visual storytelling to communicate
insights on the fly and stay in the flow of analysis.
Data preparation: Compiling multiple data sources, identifying the dimensions
and measurements, preparing it for data analysis.
August 18, Data Mining: Concepts and 1
Business Intelligent Techniques
Business intelligence techniques help understand
trends and identify patterns from big data In the
digital world, modern businesses generate big
data on daily basis. The recent advancement in
technology has opened the door for companies to
effectively store and process big data to unleash
data-driven decisions and insights.
i. Classification of Data mining frameworks as per the type of data sources mined:
This classification is as per the type of data handled. For example, multimedia, spatial
data, text data, time-series data, World Wide Web, and so on..
ii. Classification of data mining frameworks as per the database involved:
This classification based on the data model involved. For example. Object-oriented
database, transactional database, relational database, and so on..
iii. Classification of data mining frameworks as per the kind of knowledge discovered:
This classification depends on the types of knowledge discovered or data mining
functionalities. For example, discrimination, classification, clustering,
characterization, etc. some frameworks tend to be extensive frameworks offering a
few data mining functionalities together..
iv. Classification of data mining frameworks according to data mining techniques used:
This classification is as per the data analysis approach utilized, such as neural
networks, machine learning, genetic algorithms, visualization, statistics, data
warehouse-oriented or database-oriented, etc.
The classification can also take into account, the level of user interaction involved in
the data mining procedure, such as query-driven systems, autonomous systems, or
interactive exploratory systems.