You are on page 1of 5

Devi Ahilya Vishwavidyalaya,Indore

Session 2018 - 19

Submitted To: Submitted By:


Dr. SK Choube Bharat Choudhary
What is Data Mining?
Data Mining is the set of methodologies used in analyzing data from various
dimensions and perspectives, finding previously unknown hidden patterns,
classifying and grouping the data and summarizing the identified relationships.

The elements of data mining include extraction, transformation, and loading of


data onto the data warehouse system, managing data in a multidimensional
database system, providing access to business analysts and IT experts, analyzing
the data by tools, and presenting the data in a useful format, such as a graph or
table.

The Importance of Data Mining


Data can generate revenue. It is a valuable financial asset of an enterprise.
Businesses can use data mining for knowledge discovery and exploration of
available data. This can help them predict future trends, understand customer’s
preferences and purchase habits, and conduct a constructive market analysis. They
can then build models based on historical data patterns and garner more from
targeted market campaigns as well as strategize more profitable selling approaches.
Data mining helps enterprises to make informed business decisions, enhances
business intelligence, thereby improving the company’s revenue and reducing cost
overheads. Data mining is also useful in finding data anomaly patterns that are
essential in fraud detection and areas of weak or incorrect data collation/
modification.

Data Mining Techniques


The art of data mining has been constantly evolving. There are a number of
innovative and intuitive techniques that have emerged that fine-tune data mining
concepts in a bid to give companies more comprehensive insight into their own
data with useful future trends. Many techniques are employed by the data mining
experts, some of which are listed below:

Statistics
Statistics is the analysis and presentation of numeric facts of data and it is the
core of all data mining and machine learning algorithm. It provides analytical
technique and tools to apply on large volume data sets. Statistics
include planning, designing, collecting data, analyzing, drawing meaningful
interpretation and reporting of the research findings and due to this statistics is
not only limited to a mathematician, business analyst are also using it. To get
the desired output or quantify data statistics uses probability, designing surveys
and experiments.

Statistics is an important component of data mining that offers effective


analytics techniques and tools for dealing with a large amount of data for
benefiting businesses. It is a science of data learning that covers everything
from collecting to using data effectively.

Some of the popular evolving trends in Data mining are application


exploration, visual data mining, biological data mining, web mining, software
mining, distributed data mining, real data mining and lots more. And Statistics
help to identify new patterns in the available unstructured data. In any
organization due to the emergence of big data with big volume and different
velocity data plays an important role and predict outcomes data mining and
statistics is an integral part. Data mining will always use statistical thinking to
draw output hence, both Data Mining and Statistics will grow inevitably in the
near future.

Machine learning
Data mining uses many machine learning methods, but often with a slightly
different goal in mind. On the other hand, machine learning also employs data
mining methods as "unsupervised learning" or as a pre processing step to
improve learner accuracy.
In most of the cases now data mining is used to predict the result from
historical data or find a new solution from the existing data. Most of the
organization uses this technique to drive the business outcomes. Where
machine learning techniques are growing in the much faster way since it
overcomes the problems with what data mining techniques have. Since
Machine learning process is more accurate and less error prone when compared
to data mining and it is much more capable to take his own decision and resolve
the issue. But to drive the business still, we need to have data mining process
because it will define the problem of a particular business and to resolve such
problem we can use machine learning techniques. In one word we can say that
to drive a business both Data mining and Machine learning techniques have to
work hand to hand, one technique will define the problem and other will give
you the solution in the much accurate way.

Database System and Data Warehouse

A database is an organized collection of data, stored and accessed electronically.


Database designers typically organize the data to model aspects of reality in a way
that supports processes requiring information, such as (for example) modelling the
availability of rooms in hotels in a way that supports finding a hotel with
vacancies.
A database-management system (DBMS) is a computer-software application that
interacts with end-users, other applications, and the database itself to capture and
analyze data. (Sometimes a DBMS is loosely referred to as a "database".) A
general-purpose DBMS allows the definition, creation, querying, update, and
administration of databases
Data warehousing is the process of constructing and using a data warehouse. A
data warehouse is constructed by integrating data from multiple heterogeneous
sources that support analytical reporting, structured and/or ad hoc queries, and
decision making. Data warehousing involves data cleaning, data integration, and
data consolidations.

Information Retrieval

Information retrieval, as the name implies, concerns the retrieving of relevant


information from databases. It is basically concerned with facilitating the user's
access to large amounts of (predominantly textual) information. The process of
information retrieval involves the following stages:

1. Representing Collections of Documents - how to represent, identify and process


the collection of documents.
2. User-initiated querying - understanding and processing of the queries.
3. Retrieval of the appropriate documents - the searching mechanism used to obtain
and retrieve the relevant documents