You are on page 1of 16

Data Mining

Dr. Atul Garg


Index
• Introduction
• Advantages / disadvantages
• Applications of Data Mining
• Knowledge Discovery Process,
• Data Mining vs Query Tools,
• What kind of data can be mined?
Data Mining
• Mining is the process of extraction of some valuable material from the
earth e.g. coal mining, diamond mining, etc. In the context of computer
science, “Data Mining” can be referred to as knowledge mining from
data, knowledge extraction, data/pattern analysis and data
dragging. It is basically the process carried out for the extraction of
useful information from a bulk of data or data warehouses.
or
• The process of extracting information to identify patterns, trends, and
useful data that would allow the business to take the data-driven
decision from huge sets of data is called Data Mining.
Advantages of Data Mining
• The Data Mining technique enables organizations to obtain
knowledge-based data.
• Compared with other statistical data applications, data mining is a
cost-efficient.
• Data Mining helps the decision-making process of an organization.
• It Facilitates the automated discovery of hidden patterns as well as the
prediction of trends and behaviors.
• It can be induced in the new system as well as the existing platforms.
• It is a quick process that makes it easy for new users to analyze
enormous amounts of data in a short time.
Disadvantages of Data Mining
• There is a probability that the organizations may sell useful data of
customers to other organizations for money.
• Many data mining analytics software is difficult to operate and needs
advance training to work on.
• The selection of the right data mining tools is a very challenging
task.
• The data mining techniques are not precise, so that it may lead to
severe consequences in certain conditions.
Applications of Data Mining

Customer
Relationship
Management
Data Mining and Knowledge Discovery
Data mining and knowledge discovery in databases (KDD) are frequently treated as synonyms, data
mining is actually part of the knowledge discovery process.

Knowledge Discovery Process


Data Mining and Knowledge Discovery
The iterative process consists of the following steps:
• Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data are removed
from the collection.
• Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common
source.
• Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data
collection.
• Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed
into forms appropriate for the mining procedure.
• Data mining: It is the crucial step in which clever techniques are applied to extract patterns potentially useful.
• Pattern evaluation: In this step, strictly interesting patterns representing knowledge are identified based on
given measures.
• Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the
user. This essential step uses visualization techniques to help users understand and interpret the data mining
results.
Data cleaning and data integration can be performed together as a pre-processing phase to generate a data
warehouse.
Data selection and data transformation can also be combined where the consolidation of the data is the result of
the selection, or, as for the case of data warehouses, the selection is done on transformed data.
Query Tools vs Data Mining
Query Tools Data Mining
A software tool that allows end users to access Data mining is defined as a process used to extract usable
information stored in a database data from a larger set of any raw data

Query tools can be used to easily build and input Data Mining is a technique or a concept in computer
queries to databases science

Query tools make it very easy to build queries without Deals with extracting useful and previously unknown
even having to learn a database-specific query language information from raw data

Query tools the users need to know exactly what they While data mining is used mostly when the user has a
are looking for vague idea about what they are looking for

Query tools can be used to easily build and input Data miners can use the existing functionalities of
queries to databases Query Tools to pre-process raw data before the Data
mining process
Kind of data can be mined

1. Flat Files
2. Relational Databases
3. Data Warehouse
4. Transactional Databases
5. Multimedia Databases
6. Spatial Databases
7. Time Series Databases
8. World Wide Web(WWW)
9. Medical and personal data
10. Satellite sensing
11. Games
12. Text reports / Memos / Email-messages / chats
etc
Kind of data can be mined
1. Flat Files
• Flat files is defined as data files in text form or binary form with a structure that can be easily
extracted by data mining algorithms.
• Data stored in flat files have no relationship or path among themselves.
• Flat files are represented by data dictionary. Eg: CSV file.
• Application: Used in Data Warehousing to store data, Used in carrying data to and from server,
etc.
2. Relational Databases
• A Relational database is defined as the collection of data organized in tables with rows and
columns.
• Physical schema in Relational databases is a schema which defines the structure of tables.
• Logical schema in Relational databases is a schema which defines the relationship among tables.
• Standard API of relational database is SQL.
• Application: Data Mining, Relational Online Analytical Processing (ROLAP) model, etc.
Kind of data can be mined

3. Data Warehouse
• A data warehouse is defined as the collection of data integrated from multiple sources that will
queries and decision making.
• Two approaches can be used to update data in Data Warehouse: Query-driven Approach
and Update-driven Approach.
• Application: Business decision making etc.
4. Transactional Databases
• Transactional databases is a collection of data organized by time stamps, date, etc to represent
transaction in databases.
• This type of database has the capability to roll back or undo its operation when a transaction is
not completed or committed.
• Highly flexible system where users can modify information without changing any sensitive
information.
• Application: Banking, Distributed systems, Object databases, etc.
Kind of data can be mined
5. Multimedia Databases
• Multimedia databases consists audio, video, images and text media.
• They can be stored on Object-Oriented Databases.
• They are used to store complex information in a pre-specified formats.
• Application: Digital libraries, video-on demand, news-on demand, musical database, etc.
6. Spatial Database
• Store geographical information.
• Stores data in the form of coordinates, topology, lines, polygons, etc.
• Application: Maps, Global positioning, etc.
7. Time-series Databases
• Time series databases contains stock exchange data and user logged activities.
• Handles array of numbers indexed by time, date, etc.
• It requires real-time analysis.
• Application: eXtremeDB, InfluxDB, etc.
Kind of data can be mined
8. WWW refers to World wide web is a collection of documents and resources like audio, video, text, etc which are
identified by Uniform Resource Locators (URLs) through web browsers, linked by HTML pages, and accessible
via the Internet network.
•It is the most heterogeneous repository as it collects data from multiple resources.
•It is dynamic in nature as Volume of data is continuously increasing and changing.
Application: Online shopping, Job search, Research, studying, etc
9. Medical and personal data: From government census to personnel and customer files, very large collections
of information are continuously gathered about individuals and groups. Governments, companies and
organizations such as hospitals, are stockpiling very important quantities of personal data to help them manage
human resources, better understand a market, or simply assist clientele.
Applications: Hospitals, Social media etc
10. Satellite sensing: There is a countless number of satellites around the globe: some are geo-stationary above a
region, and some are orbiting around the Earth, but all are sending a non-stop stream of data to the surface.
NASA, which controls a large number of satellites, receives more data every second than what all NASA
researchers and engineers can cope with.
Applications: Space institutions etc
Kind of data can be mined
11. Games: Our society is collecting a tremendous amount of data and statistics about games, players and
athletes. From hockey scores, basketball passes and car-racing lapses, to swimming times, boxers pushes
and chess positions, all the data are stored. Commentators and journalists are using this information for
reporting, but trainers and athletes would want to exploit this data to improve performance and better
understand opponents.
Applications: BCCI, etc
12. Text reports and memos (e-mail messages): Most of the communications within and between companies
or research organizations or even private people, are based on reports and memos in textual forms often
exchanged by e-mail. These messages are regularly stored in digital form for future use and reference
creating formidable digital libraries.
Applications: e-commerce, hospitals, library etc
References
• https://webdocs.cs.ualberta.ca/~zaiane/courses/cmput690/notes/Chapte
r1/index.html
• https://www.javatpoint.com/data-mining
• https://www.differencebetween.com/difference-between-data-mining-
and-vs-query-tools/
• https://vspages.com/data-mining-vs-query-tools-1897/
• https://www.geeksforgeeks.org/types-of-sources-of-data-in-data-
mining/

You might also like