You are on page 1of 3

collecting efficiently

practice of storing securely

using data cost-effectively

so that they can MAKE DECISIONS


The goal of data management optimize the use of data
take ACTIONS

Create, access, and update data


across a diverse data tier
what is data management oracle
Store data across multiple clouds
and ‫في أماكن العمل‬

Provide high availability and


data management scope of work disaster recovery

Use data in a growing variety of


apps, analytics, and algorithms

Ensure data privacy and security

Archive and destroy data

https://www.techtarget.com/searchdatamanagement/definition/fact-table#:~:text=A%20fact%20table%20holds%20the,fact%20table%20can%20be%20analyzed
Reference
.

A fact table is the central table in a star schema of a data warehouse. A fact table
stores quantitative information for analysis and is often denormalized.

The foreign keys column allows


§ What is the difference between fact and dimension table? joins with dimension tables
Detailed answer The fact table consists of two types
A fact table Holds the data to be analyzed
of columns.
The measures columns contain the
data that is being analyzed.

Dimension table Stores data about the ways in which the data in the fact table can be analyzed.
https://www.oracle.com/eg/autonomous-database/what-is-data-mart/ Reference
Short answer
A data mart is a simple form of data warehouse focused on a single subject or line of business. With a data mart,
teams can access data and gain insights faster, because they don’t have to spend time searching within a more Short answer What is data mart?
complex data warehouse or manually aggregating data from different sources. Reference

Detailed answer

https://www.geeksforgeeks.org/data-cube-or-olap-approach-in-data-mining/
Reference
https://www.javatpoint.com/data-warehouse-what-is-data-cube

Helps in giving a summarised view


of data.

Data cube operation provides


quick and better analysis,
Advantages of data cubes:

Improve performance of data.

Detailed answer
Data cubes store large data in a
simple way.

Detailed answer What is ELT?


A data cube is created from a subset of attributes in the database. Specific attributes are
how It is created
chosen to be measure attributes, i.e., the attributes whose values are of interest.

What is the ETL?


Data cubes could be sparse in many cases because not every cell
Techniques should be developed to handle sparse cubes efficiently. wanas
in each dimension may have corresponding data in the database. § What is Data Cube?
ETL.

Extraction: This first step involves copying data from the source system.

During the loading step, the pipeline replicates data from the source into
Loading:
the target system, which might be a data warehouse or data lake.
ELT stands for "extract, load, and transform"
Once the data is in the target system, organizations can run
whatever transformations they need. Often organizations will
Transformation:
transform raw data in different ways for use with different tools or
Short answer
business processes.

https://rivery.io/blog/etl-vs-elt/

ELT stands for "extract, load, and transform" — the processes a data
Grouping of data in a multidimensional matrix is called data cubes. In Dataware housing, we pipeline uses to replicate data FROM a source system INTO a target
generally deal with various multidimensional data models as the data will be represented by multiple system such as a cloud data warehouse.
dimensions and multiple attributes. This multidimensional data is represented in the data cube as the
cube represents a high-dimensional space. The Data cube pictorially shows how different attributes of Short answer
data are arranged in the data model. Below is the diagram of a general data cube. ELT is a modern variation on the older process of extract,
transform, and load (ETL), in which transformations take place
ELT — the next generation of ETL
before the data is loaded. Running transformations before the
An describes how records in a load phase results in a more complex data replication process.
access path
database file are retrieved.
GLOSSRY
https://www.sap.com/insights/what-is-data-modeling.html#:~:text=What%20are%20the%20types%20of,oriented%2C%20and%20multi%2Dvalue
controls the servers where the Reference
Data tier .
information is stored;

Although “older” in approach, the most common database model still in use today is relational, which stores the data in fixed-format records and arranges data in tables with
queries rows and columns. The most basic type of data model has two elements: measures and dimensions. Measures are numeric values, such as quantities and revenue, used in
is a type of data management system that is designed to mathematical calculations like sum or average. Dimensions can be text or numeric. They are not used in calculations and include descriptions or locations. The raw data is
used to perform enable and support business intelligence (BI) activities, Relational: defined as a measure or a dimension. Other terminology used in relational database design includes “relations” (the table with rows and columns), “attributes” (columns),
analysis especially analytics. “tuples” (rows), and “domain” (set of values allowed in a column). While there are additional terms and structural requirements that define a relational database, the important
factor is the relationships defined within that structure. Common data elements (or keys) link tables and data sets together. Tables can also be related explicitly, like parent and
child relationships including one-to-one, one-to-many, or many-to-many.
often contain large amounts of
historical data.
Less rigid and structured, the dimensional approach favors a contextual data structure that is more related to the business use or context. This database structure is optimized
for online queries and data warehousing tools. Critical data elements, like a transaction quantity for example, are called “facts” and are accompanied by reference
application log files The data within a data warehouse Dimensional information called “dimensions,” be that product ID, unit price, or transaction date. A fact table is a primary table in a dimensional model. Retrieval can be quick and efficient
such as is usually derived from a wide – with data for a specific type of activity stored together – but the lack of relationship links can complicate analytical retrieval and use of the data. Since the data structure is
transaction applications. range of sources tied to the business function that produces and uses the data, combining data produced by dissimilar systems (in a data warehouse, for instance) can be problematic.
Detailed answer
§ What is Data Modeling? + Types of Data Modeling?

Over time, it builds a historical record that can be invaluable to An E-R model represents a business data structure in graphical form containing boxes of various shapes to represent activities, functions, or
data scientists and business analysts. Because of these “entities” and lines to represent associations, dependencies, or “relationships.” The E-R model is then used to create a relational database with
Entity-Rich (E-R):
capabilities, a data warehouse can be considered an each row representing an entity and the fields in that row contain attributes. As in all relational databases, “key” data elements are used to link
organization’s “single source of truth.” tables together.

centralizes If we model the database using ER


diagrams, we must convert them Mapping Cardinality is always a
into the relational model, which constraint in the ER model, while
large amounts of data from ER VS ERD MODEL
consolidates can be implemented by one of the the cardinality constraint cannot be
multiple sources. RDBMS languages such as SQL defined in the Relational Model.
what it is programed to do
and MySQL.
allow organizations to derive valuable
business insights from their data to The three primary data model types are relational, dimensional, and entity-relationship (E-R). There are also several others that are not in general
improve decision-making. Short answer use, including hierarchical, network, object-oriented, and multi-value. The model type defines the logical structure – how the data is stored,
logically – and therefore how it is stored, organized, and retrieved.
A relational database to store and
manage data
Reference https://www.guru99.com/star-snowflake-data-warehousing.html

An extraction, loading, and


transformation (ELT) solution for Star Schema in data warehouse, in which the center of the star can have one fact table and a number of associated dimension
preparing the data for analysis Definition tables. It is known as star schema as its structure resembles a star. The Star Schema data model is the simplest type of Data
Warehouse schema. It is also known as Star Join Schema and is optimized for querying large data sets.

Statistical analysis, reporting, and


data mining capabilities

Client analysis tools for visualizing includes the following elements:


and presenting data to business
users

Other, more sophisticated


analytical applications that
generate actionable information by
applying data science and artificial What is data warehouse
intelligence (AI) algorithms, or
graph and spatial features that Data Questions
enable more kinds of analysis of Example
data at scale

analyze large amounts of variant data

extract significant value from it allowing organizations to Benefits of a Data Warehouse

as well as to keep a historical record.

They can analyze data about a


particular subject or functional area Subject-oriented. In the following Star Schema example, the fact table is at the center which
(such as sales). contains keys to every dimension table like Dealer_ID, Model ID, Date_ID,
Product_ID Branch_ID & other attributes like Units sold and revenue.
Star Schema
Data warehouses create
consistency among different data Integrated. Every dimension in a star schema is
types from disparate sources. represented with the only one-dimension
Four unique characteristics table.

Once data is in a data warehouse,


Nonvolatile. The dimension table should
it’s stable and doesn’t change.
contain the set of attributes.

Data warehouse analysis looks at


Time-variant. The dimension table is joined to
change over time.
the fact table using a foreign key

All data warehouses share a basic design in which metadata, summary data, and raw data are
stored within the central repository of the warehouse. The repository is fed by data sources Simple. The dimension table are not joined
on one end and accessed by end users for analysis, reporting, and mining on the other end. to each other

Operational data must be cleaned and processed before being put in the warehouse. Characteristics of Star Schema: Fact table would contain key and
Although this can be done programmatically, many data warehouses add a staging Simple with a staging area. measure
area for data before it enters the warehouse, to simplify data preparation.
Data Warehouse Architecture The Star schema is easy to
. Adding data marts between the central repository and end users allows an understand and provides optimal
organization to customize its data warehouse to serve various lines of business. Hub and spoke disk usage.
When the data is ready for use, it is moved to the appropriate data mart.
The dimension tables are not
Sandboxes are private, secure, safe areas that allow companies to quickly and normalized. For instance, in the
informally explore new datasets or ways of analyzing data without having to Sandboxes. above figure, Country_ID does not
conform to or comply with the formal rules and protocol of the data warehouse. have Country lookup table as an
OLTP design would have.

https://www.guru99.com/data-warehousing.html Reference Detailed answer


The schema is widely supported by
BI Tools
Reference § What is the difference between snowflake and star schema?
Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a
Definition snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. The dimension tables are normalized
Detailed answer § What are the processes that can be done in the data warehouse?
which splits data into additional tables.

Short answer

Reference

They can analyze data about a


particular subject or functional area Subject-oriented.
(such as sales).

Data warehouses create


consistency among different data Integrated. Example
Snowflake Schema
types from disparate sources. Four unique characteristics Detailed answer What are the characteristics of a data warehouse?

Once data is in a data warehouse,


Nonvolatile.
it’s stable and doesn’t change.

Data warehouse analysis looks at


Time-variant.
change over time.

Short answer In the following Snowflake Schema example, Country is further normalized into an individual table.

The main benefit of the snowflake schema it uses smaller disk space.
Reference Easier to implement a dimension is added to the Schema
Characteristics of Snowflake Schema:
Due to multiple tables query performance is reduced
The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables
Detailed answer § What is Data Warehousing?

Short answer

Short answer

Detailed answer Is star schema OLAP or OLTP?

Reference

Reference

Detailed answer § What is the difference between OLTP and OLAP?

‫ جدول واحد الى جداول منفصلة مترتبة على بعض عشان اعرف انالسيس‬MODEL ‫تطوير من‬
Short answer
DIMENSIONAL DATA MODEL ‫يعنى بعمل‬

Reference

» Veracity so that data sources truly represent truth

» Extremely large Volumes of data


Big data is any kind of data source that has at least one of four
four Vs: Detailed answer § What is the difference between a data warehouse and big data?
shared characteristics, called the four Vs:
» The ability to move that data at a high Velocity of speed

» An ever-expanding Variety of data sources

Snow Flake vs Star schema


Short answer
Short answer

Reference https://docs.oracle.com/cd/E11882_01/server.112/e25555/tdpdw_refresh.htm#TDPDW00275

Detailed answer
Can we update a record in a data warehouse?

yes You must update your data warehouse on a regular basis to ensure that the information derived from it is current. The
Short answer
process of updating the data is called the refresh process, and this chapter describes the following topics:

Reference

§ What is the difference between the database and the data warehouse? Detailed answer

Short answer

binary between 2 entities

types ternary between 3 entities

Relations IN DBMS recursive between entity and itself worker manges other workers

one to many
cardinality
many to many

Database anomaly is normally the flaw in databases which occurs because of poor planning
What is ANOMILIES and storing everything in a flat database. Generally this is removed by the process of
normalization which is performed by splitting/joining of tables.

‫الذكاء الصناعى بييبقى انسان بياخد اكشن ما بناء‬


‫ايه الفرق بين ال داتا ساينس و مهندس الذكاء‬ NO SQL ‫على داتا غالبا‬
SQL ‫عشان كدة هو بيهتم بالداتا بيز و ال‬
‫الصناعى‬ ‫اما الداتا سايمس بيشتغل على داتا منظمة نوعا‬
‫ ما الجابة اسئلة بناء على الداتا‬SQL

WHAT IS THE DIFFENRCE


BETWEEM RELTIONALTINAL AND
TRANSCTIONAL MODEL
Data Questions
1. what is data management oracle

Link:
https://www.oracle.com/database/what-is-data-management/

1.1. practice of

1.1.1. collecting

1.1.1.1. efficiently

1.1.2. storing

1.1.2.1. securely

1.1.3. using data

1.1.3.1. cost-effectively

1.2. The goal of data management

1.2.1. optimize the use of data

1.2.1.1. so that they can MAKE DECISIONS

1.2.1.2. take ACTIONS

1.3. data management scope of work

1.3.1. Create, access, and update data across a diverse data tier

1.3.2. Store data across multiple clouds and ‫في أماكن العمل‬

1.3.3. Provide high availability and disaster recovery

1.3.4. Use data in a growing variety of apps, analytics, and algorithms

1.3.5. Ensure data privacy and security

1.3.6. Archive and destroy data

2. § What is the difference between fact and dimension table?


2.1. Reference

2.1.1. https://www.techtarget.com/searchdatamanagement/definition/fact-table#:~:text=A%20fact%20table%20holds%20the,fact%20table%20can%20be%20analyzed .

2.2. Detailed answer

2.2.1. A fact table is the central table in a star schema of a data warehouse. A fact table stores quantitative information for analysis and is often denormalized.

2.2.2. A fact table

2.2.2.1. Holds the data to be analyzed

2.2.2.1.1. The fact table consists of two types of columns.

2.2.2.1.1.1. The foreign keys column allows joins with dimension tables

2.2.2.1.1.2. The measures columns contain the data that is being analyzed.

2.2.3. Dimension table

2.2.3.1. Stores data about the ways in which the data in the fact table can be analyzed.

2.3. Short answer

3. What is the ETL?


3.1. Reference

3.2. Detailed answer

3.2.1. What is ELT?

Link:
https://rivery.io/blog/etl-vs-elt/

3.2.1.1. ETL.

3.2.1.2. ELT stands for "extract, load, and transform"

3.2.1.2.1. Extraction:

3.2.1.2.1.1. This first step involves copying data from the source system.

3.2.1.2.2. Loading:

3.2.1.2.2.1. During the loading step, the pipeline replicates data from the source into the target system, which might be a data warehouse or data lake.

3.2.1.2.3. Transformation:

3.2.1.2.3.1. Once the data is in the target system, organizations can run whatever transformations they need. Often organizations will transform raw data in different ways for use with different tools or business processes.

3.2.1.3. https://rivery.io/blog/etl-vs-elt/

3.3. Short answer

3.3.1. ELT stands for "extract, load, and transform" — the processes a data pipeline uses to replicate data FROM a source system INTO a target system such as a cloud data warehouse.

3.3.2. ELT — the next generation of ETL

3.3.2.1. ELT is a modern variation on the older process of extract, transform, and load (ETL), in which transformations take place before the data is loaded. Running transformations before the load phase results in a more complex data replication process.

4. § What is Data Modeling? + Types of Data Modeling?


4.1. Reference

4.1.1. https://www.sap.com/insights/what-is-data-modeling.html#:~:text=What%20are%20the%20types%20of,oriented%2C%20and%20multi%2Dvalue .

4.2. Detailed answer

4.2.1. Relational:

4.2.1.1. Although “older” in approach, the most common database model still in use today is relational, which stores the data in fixed-format records and arranges data in tables with rows and columns. The most basic type of data model has two elements: measures and dimensions. Measures are numeric values, such as quantities and revenue, used in mathematical calculations like sum or average. Dimensions can be text or numeric. They are not used in
calculations and include descriptions or locations. The raw data is defined as a measure or a dimension. Other terminology used in relational database design includes “relations” (the table with rows and columns), “attributes” (columns), “tuples” (rows), and “domain” (set of values allowed in a column). While there are additional terms and structural requirements that define a relational database, the important factor is the relationships defined within that structure.
Common data elements (or keys) link tables and data sets together. Tables can also be related explicitly, like parent and child relationships including one-to-one, one-to-many, or many-to-many.

4.2.2. Dimensional

4.2.2.1. Less rigid and structured, the dimensional approach favors a contextual data structure that is more related to the business use or context. This database structure is optimized for online queries and data warehousing tools. Critical data elements, like a transaction quantity for example, are called “facts” and are accompanied by reference information called “dimensions,” be that product ID, unit price, or transaction date. A fact table is a primary table in a
dimensional model. Retrieval can be quick and efficient – with data for a specific type of activity stored together – but the lack of relationship links can complicate analytical retrieval and use of the data. Since the data structure is tied to the business function that produces and uses the data, combining data produced by dissimilar systems (in a data warehouse, for instance) can be problematic.

4.2.3. Entity-Rich (E-R):

4.2.3.1. An E-R model represents a business data structure in graphical form containing boxes of various shapes to represent activities, functions, or “entities” and lines to represent associations, dependencies, or “relationships.” The E-R model is then used to create a relational database with each row representing an entity and the fields in that row contain attributes. As in all relational databases, “key” data elements are used to link tables together.

4.2.4. ER VS ERD MODEL

4.2.4.1. If we model the database using ER diagrams, we must convert them into the relational model, which can be implemented by one of the RDBMS languages such as SQL and MySQL.

4.2.4.1.1. Mapping Cardinality is always a constraint in the ER model, while the cardinality constraint cannot be defined in the Relational Model.

4.3. Short answer

4.3.1. The three primary data model types are relational, dimensional, and entity-relationship (E-R). There are also several others that are not in general use, including hierarchical, network, object-oriented, and multi-value. The model type defines the logical structure – how the data is stored, logically – and therefore how it is stored, organized, and retrieved.

5. § What is the difference between snowflake and star schema?


5.1. Reference

5.1.1. https://www.guru99.com/star-snowflake-data-warehousing.html
5.2. Detailed answer

5.2.1. Star Schema

5.2.1.1. Definition

5.2.1.1.1. Star Schema in data warehouse, in which the center of the star can have one fact table and a number of associated dimension tables. It is known as star schema as its structure resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema. It is also known as Star Join Schema and is optimized for querying large data sets.

5.2.1.2. Example

5.2.1.2.1. In the following Star Schema example, the fact table is at the center which contains keys to every dimension table like Dealer_ID, Model ID, Date_ID, Product_ID Branch_ID & other attributes like Units sold and revenue.

5.2.1.3. Characteristics of Star Schema:

5.2.1.3.1. Every dimension in a star schema is represented with the only one-dimension table.

5.2.1.3.2. The dimension table should contain the set of attributes.

5.2.1.3.3. The dimension table is joined to the fact table using a foreign key

5.2.1.3.4. The dimension table are not joined to each other

5.2.1.3.5. Fact table would contain key and measure

5.2.1.3.6. The Star schema is easy to understand and provides optimal disk usage.

5.2.1.3.7. The dimension tables are not normalized. For instance, in the above figure, Country_ID does not have Country lookup table as an OLTP design would have.

5.2.1.3.8. The schema is widely supported by BI Tools

5.2.2. Snowflake Schema

5.2.2.1. Definition

5.2.2.1.1. Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. The dimension tables are normalized which splits data into additional tables.

5.2.2.2. Example

5.2.2.2.1. In the following Snowflake Schema example, Country is further normalized into an individual table.

5.2.2.3. Characteristics of Snowflake Schema:

5.2.2.3.1. The main benefit of the snowflake schema it uses smaller disk space. Easier to implement a dimension is added to the Schema Due to multiple tables query performance is reduced The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables

5.3. Short answer

5.3.1. Snow Flake vs Star schema

5.3.1.1. ‫ تطوير من‬MODEL ‫ جدول واحد الى جداول منفصلة مترتبة على بعض عشان اعرف انالسيس يعنى بعمل‬DIMENSIONAL DATA MODEL

6. What is ANOMILIES
6.1. Database anomaly is normally the flaw in databases which occurs because of poor planning and storing everything in a flat database. Generally this is removed by the process of normalization which is performed by splitting/joining of tables.

7. Can we update a record in a data warehouse?


7.1. Reference

7.1.1. https://docs.oracle.com/cd/E11882_01/server.112/e25555/tdpdw_refresh.htm#TDPDW00275

7.2. Detailed answer

7.3. Short answer

7.3.1. yes You must update your data warehouse on a regular basis to ensure that the information derived from it is current. The process of updating the data is called the refresh process, and this chapter describes the following topics:

8. § What is the difference between the database and the data warehouse?
8.1. Reference

8.2. Detailed answer

8.3. Short answer


9. Relations IN DBMS
9.1. types

9.1.1. binary

9.1.1.1. between 2 entities

9.1.2. ternary

9.1.2.1. between 3 entities

9.1.3. recursive

9.1.3.1. between entity and itself

9.1.3.1.1. worker manges other workers

9.2. cardinality

9.2.1. one to many

9.2.2. many to many

10. ‫ايه الفرق بين ال داتا ساينس و مهندس الذكاء الصناعى‬


10.1. ‫ الذكاء الصناعى بييبقى انسان بياخد اكشن ما بناء على داتا غالبا‬NO SQL ‫ اما الداتا سايمس بيشتغل على داتا منظمة نوعا‬SQL ‫ما الجابة اسئلة بناء على الداتا‬

10.1.1. ‫ عشان كدة هو بيهتم بالداتا بيز و ال‬SQL

11. WHAT IS THE DIFFENRCE BETWEEM RELTIONALTINAL AND TRANSCTIONAL MODEL


12. Is star schema OLAP or OLTP?
12.1. Short answer

12.2. Detailed answer

12.3. Reference

13. § What is Data Warehousing?


13.1. Reference
13.2. Detailed answer
13.3. Short answer

14. § What is the difference between OLTP and OLAP?


14.1. Reference

14.2. Detailed answer

14.3. Short answer

15. What are the characteristics of a data warehouse?


15.1. Reference

15.2. Detailed answer

15.2.1. Four unique characteristics

15.2.1.1. Subject-oriented.

15.2.1.1.1. They can analyze data about a particular subject or functional area (such as sales).

15.2.1.2. Integrated.

15.2.1.2.1. Data warehouses create consistency among different data types from disparate sources.

15.2.1.3. Nonvolatile.

15.2.1.3.1. Once data is in a data warehouse, it’s stable and doesn’t change.

15.2.1.4. Time-variant.

15.2.1.4.1. Data warehouse analysis looks at change over time.

15.3. Short answer

16. § What is the difference between a data warehouse and big data?
16.1. Reference

16.2. Detailed answer

16.2.1. Big data is any kind of data source that has at least one of four shared characteristics, called the four Vs:

16.2.1.1. four Vs:

16.2.1.1.1. » Veracity so that data sources truly represent truth

16.2.1.1.2. » Extremely large Volumes of data

16.2.1.1.3. » The ability to move that data at a high Velocity of speed

16.2.1.1.4. » An ever-expanding Variety of data sources

16.3. Short answer

17. § What are the processes that can be done in the data warehouse?
17.1. Reference
17.2. Detailed answer

17.3. Short answer

18. What is data warehouse


18.1. is a type of data management system that is designed to enable and support business intelligence (BI) activities, especially analytics.

18.1.1. used to perform

18.1.1.1. queries

18.1.1.2. analysis

18.2. often contain large amounts of historical data.

18.3. The data within a data warehouse is usually derived from a wide range of sources

18.3.1. such as

18.3.1.1. application log files

18.3.1.2. transaction applications.

18.4. Over time, it builds a historical record that can be invaluable to data scientists and business analysts. Because of these capabilities, a data warehouse can be considered an organization’s “single source of truth.”

18.5. what it is programed to do

18.5.1. centralizes

18.5.2. consolidates

18.5.2.1. large amounts of data from multiple sources.

18.5.3. allow organizations to derive valuable business insights from their data to improve decision-making.

18.6. includes the following elements:

18.6.1. A relational database to store and manage data

18.6.2. An extraction, loading, and transformation (ELT) solution for preparing the data for analysis

18.6.3. Statistical analysis, reporting, and data mining capabilities

18.6.4. Client analysis tools for visualizing and presenting data to business users

18.6.5. Other, more sophisticated analytical applications that generate actionable information by applying data science and artificial intelligence (AI) algorithms, or graph and spatial features that enable more kinds of analysis of data at scale

18.7. Benefits of a Data Warehouse

18.7.1. allowing organizations to

18.7.1.1. analyze large amounts of variant data

18.7.1.2. extract significant value from it

18.7.1.3. as well as to keep a historical record.

18.8. Four unique characteristics

18.8.1. Subject-oriented.

18.8.1.1. They can analyze data about a particular subject or functional area (such as sales).

18.8.2. Integrated.

18.8.2.1. Data warehouses create consistency among different data types from disparate sources.

18.8.3. Nonvolatile.

18.8.3.1. Once data is in a data warehouse, it’s stable and doesn’t change.

18.8.4. Time-variant.

18.8.4.1. Data warehouse analysis looks at change over time.

18.9. Data Warehouse Architecture

18.9.1. Simple.

18.9.1.1. All data warehouses share a basic design in which metadata, summary data, and raw data are stored within the central repository of the warehouse. The repository is fed by data sources on one end and accessed by end users for analysis, reporting, and mining on the other end.

18.9.2. Simple with a staging area.

18.9.2.1. Operational data must be cleaned and processed before being put in the warehouse. Although this can be done programmatically, many data warehouses add a staging area for data before it enters the warehouse, to simplify data preparation.

18.9.3. Hub and spoke

18.9.3.1. . Adding data marts between the central repository and end users allows an organization to customize its data warehouse to serve various lines of business. When the data is ready for use, it is moved to the appropriate data mart.

18.9.4. Sandboxes.

18.9.4.1. Sandboxes are private, secure, safe areas that allow companies to quickly and informally explore new datasets or ways of analyzing data without having to conform to or comply with the formal rules and protocol of the data warehouse.

18.10. Reference

18.10.1. https://www.guru99.com/data-warehousing.html

19. GLOSSRY
19.1. access path

19.1.1. An describes how records in a database file are retrieved.

19.2. Data tier

19.2.1. controls the servers where the information is stored;

20. § What is Data Cube?


20.1. Reference

20.1.1. https://www.geeksforgeeks.org/data-cube-or-olap-approach-in-data-mining/

20.1.2. https://www.javatpoint.com/data-warehouse-what-is-data-cube

20.2. Detailed answer

20.2.1. Advantages of data cubes:

20.2.1.1. Helps in giving a summarised view of data.

20.2.1.2. Data cube operation provides quick and better analysis,

20.2.1.3. Improve performance of data.

20.2.1.4. Data cubes store large data in a simple way.

20.2.2. how It is created

20.2.2.1. A data cube is created from a subset of attributes in the database. Specific attributes are chosen to be measure attributes, i.e., the attributes whose values are of interest.

20.2.3. wanas

20.2.3.1. Techniques should be developed to handle sparse cubes efficiently.

20.2.3.1.1. Data cubes could be sparse in many cases because not every cell in each dimension may have corresponding data in the database.

20.3. Short answer

20.3.1. Grouping of data in a multidimensional matrix is called data cubes. In Dataware housing, we generally deal with various multidimensional data models as the data will be represented by multiple dimensions and multiple attributes. This multidimensional data is represented in the data cube as the cube represents a high-dimensional space. The Data cube pictorially shows how different attributes of data are arranged in the data model. Below is the diagram of a
general data cube.

21. What is data mart?


21.1. Reference

21.1.1. https://www.oracle.com/eg/autonomous-database/what-is-data-mart/

21.2. Short answer

21.2.1. A data mart is a simple form of data warehouse focused on a single subject or line of business. With a data mart, teams can access data and gain insights faster, because they don’t have to spend time searching within a more complex data warehouse or manually aggregating data from different sources.

21.3. Detailed answer

You might also like