You are on page 1of 15

Topic 12.

Prospects for the development of


databases

HNEU,
Department of Information Systems,
Course Database,
V. V. Fedko
Test questions
En: Ru:
1. What are data lakes used for? 1. Для чего используются озера
2. What is a data factory? данных?
3. Why do we need data 2. Что такое фабрика данных?
virtualization? 3. Зачем нужна виртуализация
данных?

HNEU, Department of Information Systems, Course Database, V. V. Fedko 2


Contents
1. Data lakes.
2. Data factories.
3. Data virtualization.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 3


1. Data lakes
The concept of data lake
A data lake is a system or repository of data stored in its natural or raw
format, usually object blobs or files.

A data lake is usually a single store


of data including raw copies of:
• source system data,
• sensor data,
• social data etc.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 4


Using data lakes
Transformed data used for tasks such as:
• reporting,
• visualization,
• advanced analytics,
• machine learning.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 5


Data structures
A data lake can include:
• structured data from relational databases (rows and columns),
• semi-structured data (CSV, logs, XML, JSON),
• unstructured data (emails, documents, PDFs),
• binary data (images, audio, video).

A data lake can be established


• on premises (within an organization's data centres)
• in the cloud (using cloud services from vendors such as Amazon,
Microsoft, or Google).

HNEU, Department of Information Systems, Course Database, V. V. Fedko 6


https://www.youtube.com/watch?v=LxcH6z8TFpI
HNEU, Department of Information Systems, Course Database, V. V. Fedko 7
2. Data factories
The concept of data integration

Data integration is the process of


combining data from different sources
into a single, unified view.
Integration includes steps such as
extract, transform, load (ETL) process
extracts information from the source
databases, transforms it and then loads it
into the data warehouse.
Microsoft SSIS is a platform for data
integration and workflow applications
without programming.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 8


Azure Data Factory

Azure Data Factory is the


platform that solves ETL
data scenarios in the clouds.
It is the cloud-based ETL
and data integration service
that allows you to create data-
driven workflows for
orchestrating data movement
and transforming data at scale.

A modern data factory loads various data into a data


lake, does some processing on that data, and then
loads the appropriate subset into a relational data
warehouse for analysis.
HNEU, Department of Information Systems, Course Database, V. V. Fedko 9
https://www.youtube.com/watch?v=MaGilV7ZSvA
HNEU, Department of Information Systems, Course Database, V. V. Fedko 10
3. Data virtualization
The concept of data virtualization
Data virtualization is an approach to data management that
allows an application to retrieve and manipulate data without
requiring technical details about the data, such as how it is
formatted at source, or where it is physically located, and can
provide a single customer view (or a single view of any other
entity) of the overall data.
Unlike the traditional extract, transform, load ("ETL") process, the
data remains in place, and real-time access is given to the source
system for the data.
This reduces the risk of data errors, of the workload moving data
around that may never be used, and it does not attempt to
impose a single data model on the data (an example of
heterogeneous data is a federated database system).
HNEU, Department of Information Systems, Course Database, V. V. Fedko 11
Using data virtualization

The data virtualization


concept and software is a
subset of data integration
and is commonly used within
• business intelligence,
• service-oriented
architecture data services,
• cloud computing,
• enterprise search,
• master data management.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 12


Capabilities
Cost savings: It’s cheaper to store and maintain data than it is to replicate and spend
resources transforming it into different formats and locations.
Logical abstraction and decoupling: Heterogeneous data sources can now interact
more easily through data virtualization.
Data governance: Through central management, data governance challenges can be
lessened, and rules can be more easily applied to all of the data from one location.
Bridging structured and unstructured: Data virtualization can bridge the semantic
differences of unstructured and structured data, integration is easier and data
quality improves across the board.
Increased productivity: Aside from the aforementioned bridging of data,
virtualization also makes it easier to test and deploy data-driven apps, since less time
is needed for integrating data sources.

HNEU, Department of Information Systems, Course Database, V. V. Fedko 13


Denodo is a leader in data virtualization

https://www.youtube.com/watch?v=3eWltRLA0ZY
HNEU, Department of Information Systems, Course Database, V. V. Fedko 14
Test questions
En: Ru:
1. What are data lakes used for? 1. Для чего используются озера
2. What is a data factory? данных?
3. Why do we need data 2. Что такое фабрика данных?
virtualization? 3. Зачем нужна виртуализация
данных?

HNEU, Department of Information Systems, Course Database, V. V. Fedko 15

You might also like