Professional Documents
Culture Documents
Lecture by
Dr. Ruchi Garg
Lloyd Business School
Greater Noida
Need of Data Management
2/19
The more data is collected, the more monitoring and validation would be
required.
Large organizations may wind up with tens of business solutions, each with its
own data repository, such as databases, CRM, ERP, and so on.
One must get rid of unnecessary data while retaining high-quality and accurate
data.
When data is gathered from many sources, inconsistency in the data is
unavoidable. Inadequate data management processes and systems contribute
to inaccurate data.
The ultimate purpose of having quality ready data is to have it available for
further analysis and processing by other business intelligence tools in order to
deliver it to senior management for more informed decision making.
Challenges of Data Management
6/19
Data independence helps you to keep data separated from all programs that
make use of it.
Data Independence
9/19
Due to Physical independence, any of the below change will not affect the
conceptual layer.
Due to Logical independence, any of the below change will not affect the
external layer.
Duplicity of data
A common example of data redundancy is when a name and address are both
present in different columns within a table.
If the link between these data points is defined in every single new database
entry it would lead to unnecessary duplication across the entire table.
Data Redundancy
14/19
This opens up the possibility that the data becomes inconsistent across the
database. (Data Consistency)
Data Consistency
15/19
Data consistency means that each user sees a consistent view of the data,
including visible changes made by the user's own transactions and transactions
of other users.
Data Administration
16/19
Data administration is the process by which data is monitored, maintained and managed
by a data administrator and/or an organization. Data administration allows an
organization to control its data assets, as well as their processing and interactions with
different applications and business processes.
A relational database contains multiple tables of data with rows and columns that relate
to each other through special key fields.
These databases are more flexible than flat file structures, and provide functionality for
reading, creating, updating, and deleting data.
The relationships between records are pre-defined in a one to one manner, between
'parent and child' nodes.
They require the user to pass a hierarchy in order to access needed data.
The model can be viewed as an upside-down tree where each member information is the
branch linked to the owner, which is the bottom of the tree.
Relationships are in a net-like form where a single element can point to multiple data
elements and can itself be pointed to by multiple data elements.
Types of DBMS: Object-Oriented
22/19
It is a system used for reporting and data analysis and is considered a core component
of business intelligence.
DWs are central repositories of integrated data from one or more disparate sources.
Data flows into a data warehouse from transactional systems, relational databases, etc.
Purpose of Data Warehouse
24/19
A data warehouse is a type of data management system that is designed to enable and
support business intelligence (BI) activities, especially analytics.
Data warehouses are solely intended to perform queries and analysis and often contain
large amounts of historical data.
Characteristics of Data Warehouse
25/19
Characteristics of Data Warehouse
26/19
Subject-oriented –
A data warehouse is always a subject oriented as it delivers information about a theme
instead of organization’s current operations. It can be achieved on specific theme. That
means the data warehousing process is proposed to handle with a specific theme which
is more defined. These themes can be sales, distributions, marketing etc.
Integrated –
Data from the different databases. The data required to be resided into various data
warehouse in shared and generally granted manner.
Characteristics of Data Warehouse
27/19
Time-Variant –
The data is maintained via different intervals of time such as weekly, monthly, or
annually etc. It founds various time limit which are structured between the large datasets
and are held in online transaction process (OLTP).
Non-Volatile –
As the name defines the data resided in data warehouse is permanent. It also means
that data is not erased or deleted when new data is inserted.
Uses of Data Warehouse
28/19
A data warehouse is specially designed for data analytics, which involves reading large
amounts of data to understand relationships and trends across the data.
A database is used to capture and store data, such as recording details of a transaction.
Data Mining
29/19
Data mining is the process of extracting and discovering patterns in large data sets
involving methods at the intersection of machine learning, statistics, and database
systems.
Data mining is the process of finding anomalies, patterns and correlations within large
data sets to predict outcomes.
Data Mining Techniques
30/19
Classification
31/19
1. Classification:
This technique is used to obtain important and relevant information about data and metadata. This data mining technique helps to classify
data in different classes.
type of data sources mined:
This classification is as per the type of data handled. For example, multimedia, spatial data, text data, time-series data, World Wide
Web, and so on..
database involved:
This classification based on the data model involved. For example. Object-oriented database, transactional database, relational
database, and so on..
kind of knowledge discovered:
This classification depends on the types of knowledge discovered or data mining functionalities. For example, discrimination, classification,
clustering, characterization, etc. some frameworks tend to be extensive frameworks offering a few data mining functionalities together..
data mining techniques used:
This classification is as per the data analysis approach utilized, such as neural networks, machine learning, genetic algorithms,
visualization, statistics, data warehouse-oriented or database-oriented, etc.
Clustering
32/19
Regression analysis is the data mining process is used to identify and analyze the
relationship between variables because of the presence of the other factor.
It is used to define the probability of the specific variable. Regression, primarily a form
of planning and modeling.
For example, we might use it to project certain costs, depending on other factors such as
availability, consumer demand, and competition. Primarily it gives the exact relationship
between two or more variables in the given data set.
Outer detection
34/19
This type of data mining technique relates to the observation of data items in the data
set, which do not match an expected pattern or expected behavior. This technique may
be used in various domains like intrusion, detection, fraud detection, etc. It is also known
as Outlier Analysis or Outilier mining. The outlier is a data point that diverges too much
from the rest of the dataset. The majority of the real-world datasets have an outlier.
Outlier detection plays a significant role in the data mining field. Outlier detection is
valuable in numerous fields like network interruption identification, credit or debit card
fraud detection, detecting outlying in wireless sensor network data, etc.
Sequential Patterns
35/19
The sequential pattern is a data mining technique specialized for evaluating sequential
data to discover sequential patterns. It comprises of finding interesting subsequences in a
set of sequences, where the stake of a sequence can be measured in terms of different
criteria like length, occurrence frequency, etc.
In other words, this technique of data mining helps to discover or recognize similar patterns
in transaction data over some time.
Prediction
36/19
Prediction used a combination of other data mining techniques such as trends, clustering,
classification, etc. It analyzes past events or instances in the right sequence to predict a
future event.
Association Rules
37/19
This data mining technique helps to discover a link between two or more items. It finds a
hidden pattern in the data set.
Association rules are if-then statements that support to show the probability of
interactions between data items within large data sets in different types of databases.
Association rule mining has several applications and is commonly used to help sales
correlations in data or medical data sets.
References
38/19
Self Notes
https://theecmconsultant.com/data-management-challenges/
https://www.javatpoint.com/dbms-data-independence
https://www.guru99.com/dbms-data-independence.html
https://www.nibusinessinfo.co.uk/content/types-database-system
https://www.geeksforgeeks.org/characteristics-and-functions-of-data-warehouse/
https://www.javatpoint.com/data-mining-cluster-vs-data-warehousing
https://www.javatpoint.com/data-mining-techniques
Thank You