You are on page 1of 6


A decision support system (DSS) is a computer program

application that analyzes business data and presents it so
that users can make business decisions more easily.
2. Decision-making can be regarded as the cognitive
process resulting in the selection of a belief or a course of
action among several alternative possibilities.
3. A data warehouse is a subject-oriented, integrated, timevariant and non-volatile collection of data in support of
management's decision making process.
4. Metadata is data that describes other data. Meta is a prefix
that in most information technology usages means "an
underlying definition or description."
5. A fact table is the central table in a star schema of a data
warehouse. A fact table stores quantitative information for
analysis and is often denormalized.
6. the Star Schema is the simplest style of data mart schema.
The star schema consists of one or more fact tables
referencing any number of dimension tables.
7. Online Analytical Processing Server (OLAP) is based on the
multidimensional data model. It allows managers, and
analysts to get an insight of the information through fast,
consistent, and interactive access to information.
8. Physical storage is the storage that is on physical disks
within discovered enclosures.
9. indexing collects, parses, and stores data to facilitate fast
and accurate information retrieval.
Data security means protecting data, such as a
database, from destructive forces, and from the unwanted
actions of unauthorized users.
The uses of statistics in data mining are :

Estimate the complexity of a data mining problem.

Suggest which data mining techniques are most likely to be
successful, and
Identify data fields that contain the most surface information.
A relational database is a collection of data items organized
as a set of formally-described tables from which data can be
accessed or reassembled in many different ways without
having to reorganize the database tables. The relational
database was invented by E. F. Codd at IBM in 1970.
The standard user and application program interface to a
relational database is the structured query language (SQL). SQL
statements are used both for interactive queries for information
from a relational database and for gathering data for reports.
A temporal database is a database with built-in support for
handling data involving time, being related to Slowly changing
dimension concept, for example a temporal data model and a
temporal version of Structured Query Language (SQL).
More specifically the temporal aspects usually include valid
time and transaction time. These attributes can be combined to
form bitemporal data.

Valid time is the time period during which a fact is true with
respect to the real world.

Transaction time is the time period during which a fact

stored in the database is considered to be true.

Bitemporal data combines both Valid and Transaction Time.

A time series database (TSDB) is a software system that is
optimized for handling time series data, arrays of numbers
indexed by time (a datetime or a datetime range). In some fields
these time series are called profiles, curves, or traces. A time
series of stock prices might be called a price curve. A time series
of energy consumption might be called a load profile. A log of
temperature values over time might be called a temperature trace.
Despite the disparate names, many of the same mathematical
operations, queries, or database transactions are useful for
analysing all of them. The implementation of a database that can
correctly, reliably, and efficiently implement these operations must
be specialized for time-series data.
Machine learning is done because it is a scientific discipline that
explores the construction and study of algorithms that
can learn from data.[1] Such algorithms operate by building
a model based on inputs[2]:2 and using that to make predictions or
decisions, rather than following only explicitly programmed
instructions. Machine learning can be considered a subfield
of computer science andstatistics. It has strong ties to artificial
intelligence and optimization, which deliver methods, theory and
application domains to the field. Machine learning is employed in
a range of computing tasks where designing and programming
explicit, rule-based algorithms is infeasible.
The steps in the data mining process are :
1. Business Understanding: Understand the project
objectives and requirements from a business perspective,
and then convert this knowledge into a data mining problem

definition and a preliminary plan designed to achieve the

2. Data Understanding: Start by collecting data, then get
familiar with the data, to identify data quality problems, to
discover first insights into the data, or to detect interesting
subsets to form hypotheses about hidden information.
3. Data Preparation: Includes all activities required to
construct the final data set (data that will be fed into the
modeling tool) from the initial raw data. Tasks include table,
case, and attribute selection as well as transformation and
cleaning of data for modeling tools.
4. Modeling: Select and apply a variety of modelling
techniques, and calibrate tool parameters to optimal values.
Typically, there are several techniques for the same data
mining problem type. Some techniques have specific
requirements on the form of data. Therefore, stepping back
to the data preparation phase is often needed.
5. Evaluation: Thoroughly evaluate the model, and review the
steps executed to construct the model, to be certain it
properly achieves the business objectives. Determine if there
is some important business issue that has not been
sufficiently considered. At the end of this phase, a decision
on the use of the data mining results is reached.
6. Deployment: Organize and present the results of data
mining. Deployment can be as simple as generating a report
or as complex as implementing a repeatable data mining

Descriptive data mining is a mathematical process that describes
real-world events and the relationships between factors
responsible for them. The process is used by consumer-driven
organizations to help them target their marketing and advertising
In descriptive data mining g, customer groups are clustered
according to demographics, purchasing behavior, expressed
interests and other descriptive factors. Statistics can identify
where the customer groups share similarities and where they
differ. The most active customers get special attention because
they offer the greatest ROI (return on investment).
In customer relationship management (CRM), predictive data
mining is used to create a statistical model for future behavior. It's
also used in email filtering systems to identify the probability that
a given message is spam
Predictive data mining is a process used in predictive analyticsto
create a statistical model of future behavior. Predictive analytics is
the area of data mining concerned with forecasting probabilities
and trends.
The requirements of clustering are :