Professional Documents
Culture Documents
https://www.techtarget.com/searchdatamanagement/definition/fact-table#:~:text=A%20fact%20table%20holds%20the,fact%20table%20can%20be%20analyzed
Reference
.
A fact table is the central table in a star schema of a data warehouse. A fact table
stores quantitative information for analysis and is often denormalized.
Dimension table Stores data about the ways in which the data in the fact table can be analyzed.
https://www.oracle.com/eg/autonomous-database/what-is-data-mart/ Reference
Short answer
A data mart is a simple form of data warehouse focused on a single subject or line of business. With a data mart,
teams can access data and gain insights faster, because they don’t have to spend time searching within a more Short answer What is data mart?
complex data warehouse or manually aggregating data from different sources. Reference
Detailed answer
https://www.geeksforgeeks.org/data-cube-or-olap-approach-in-data-mining/
Reference
https://www.javatpoint.com/data-warehouse-what-is-data-cube
Detailed answer
Data cubes store large data in a
simple way.
Extraction: This first step involves copying data from the source system.
During the loading step, the pipeline replicates data from the source into
Loading:
the target system, which might be a data warehouse or data lake.
ELT stands for "extract, load, and transform"
Once the data is in the target system, organizations can run
whatever transformations they need. Often organizations will
Transformation:
transform raw data in different ways for use with different tools or
Short answer
business processes.
https://rivery.io/blog/etl-vs-elt/
ELT stands for "extract, load, and transform" — the processes a data
Grouping of data in a multidimensional matrix is called data cubes. In Dataware housing, we pipeline uses to replicate data FROM a source system INTO a target
generally deal with various multidimensional data models as the data will be represented by multiple system such as a cloud data warehouse.
dimensions and multiple attributes. This multidimensional data is represented in the data cube as the
cube represents a high-dimensional space. The Data cube pictorially shows how different attributes of Short answer
data are arranged in the data model. Below is the diagram of a general data cube. ELT is a modern variation on the older process of extract,
transform, and load (ETL), in which transformations take place
ELT — the next generation of ETL
before the data is loaded. Running transformations before the
An describes how records in a load phase results in a more complex data replication process.
access path
database file are retrieved.
GLOSSRY
https://www.sap.com/insights/what-is-data-modeling.html#:~:text=What%20are%20the%20types%20of,oriented%2C%20and%20multi%2Dvalue
controls the servers where the Reference
Data tier .
information is stored;
Although “older” in approach, the most common database model still in use today is relational, which stores the data in fixed-format records and arranges data in tables with
queries rows and columns. The most basic type of data model has two elements: measures and dimensions. Measures are numeric values, such as quantities and revenue, used in
is a type of data management system that is designed to mathematical calculations like sum or average. Dimensions can be text or numeric. They are not used in calculations and include descriptions or locations. The raw data is
used to perform enable and support business intelligence (BI) activities, Relational: defined as a measure or a dimension. Other terminology used in relational database design includes “relations” (the table with rows and columns), “attributes” (columns),
analysis especially analytics. “tuples” (rows), and “domain” (set of values allowed in a column). While there are additional terms and structural requirements that define a relational database, the important
factor is the relationships defined within that structure. Common data elements (or keys) link tables and data sets together. Tables can also be related explicitly, like parent and
child relationships including one-to-one, one-to-many, or many-to-many.
often contain large amounts of
historical data.
Less rigid and structured, the dimensional approach favors a contextual data structure that is more related to the business use or context. This database structure is optimized
for online queries and data warehousing tools. Critical data elements, like a transaction quantity for example, are called “facts” and are accompanied by reference
application log files The data within a data warehouse Dimensional information called “dimensions,” be that product ID, unit price, or transaction date. A fact table is a primary table in a dimensional model. Retrieval can be quick and efficient
such as is usually derived from a wide – with data for a specific type of activity stored together – but the lack of relationship links can complicate analytical retrieval and use of the data. Since the data structure is
transaction applications. range of sources tied to the business function that produces and uses the data, combining data produced by dissimilar systems (in a data warehouse, for instance) can be problematic.
Detailed answer
§ What is Data Modeling? + Types of Data Modeling?
Over time, it builds a historical record that can be invaluable to An E-R model represents a business data structure in graphical form containing boxes of various shapes to represent activities, functions, or
data scientists and business analysts. Because of these “entities” and lines to represent associations, dependencies, or “relationships.” The E-R model is then used to create a relational database with
Entity-Rich (E-R):
capabilities, a data warehouse can be considered an each row representing an entity and the fields in that row contain attributes. As in all relational databases, “key” data elements are used to link
organization’s “single source of truth.” tables together.
All data warehouses share a basic design in which metadata, summary data, and raw data are
stored within the central repository of the warehouse. The repository is fed by data sources Simple. The dimension table are not joined
on one end and accessed by end users for analysis, reporting, and mining on the other end. to each other
Operational data must be cleaned and processed before being put in the warehouse. Characteristics of Star Schema: Fact table would contain key and
Although this can be done programmatically, many data warehouses add a staging Simple with a staging area. measure
area for data before it enters the warehouse, to simplify data preparation.
Data Warehouse Architecture The Star schema is easy to
. Adding data marts between the central repository and end users allows an understand and provides optimal
organization to customize its data warehouse to serve various lines of business. Hub and spoke disk usage.
When the data is ready for use, it is moved to the appropriate data mart.
The dimension tables are not
Sandboxes are private, secure, safe areas that allow companies to quickly and normalized. For instance, in the
informally explore new datasets or ways of analyzing data without having to Sandboxes. above figure, Country_ID does not
conform to or comply with the formal rules and protocol of the data warehouse. have Country lookup table as an
OLTP design would have.
Short answer
Reference
Short answer In the following Snowflake Schema example, Country is further normalized into an individual table.
The main benefit of the snowflake schema it uses smaller disk space.
Reference Easier to implement a dimension is added to the Schema
Characteristics of Snowflake Schema:
Due to multiple tables query performance is reduced
The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables
Detailed answer § What is Data Warehousing?
Short answer
Short answer
Reference
Reference
جدول واحد الى جداول منفصلة مترتبة على بعض عشان اعرف انالسيسMODEL تطوير من
Short answer
DIMENSIONAL DATA MODEL يعنى بعمل
Reference
Reference https://docs.oracle.com/cd/E11882_01/server.112/e25555/tdpdw_refresh.htm#TDPDW00275
Detailed answer
Can we update a record in a data warehouse?
yes You must update your data warehouse on a regular basis to ensure that the information derived from it is current. The
Short answer
process of updating the data is called the refresh process, and this chapter describes the following topics:
Reference
§ What is the difference between the database and the data warehouse? Detailed answer
Short answer
Relations IN DBMS recursive between entity and itself worker manges other workers
one to many
cardinality
many to many
Database anomaly is normally the flaw in databases which occurs because of poor planning
What is ANOMILIES and storing everything in a flat database. Generally this is removed by the process of
normalization which is performed by splitting/joining of tables.
Link:
https://www.oracle.com/database/what-is-data-management/
1.1. practice of
1.1.1. collecting
1.1.1.1. efficiently
1.1.2. storing
1.1.2.1. securely
1.1.3.1. cost-effectively
1.3.1. Create, access, and update data across a diverse data tier
1.3.2. Store data across multiple clouds and في أماكن العمل
2.1.1. https://www.techtarget.com/searchdatamanagement/definition/fact-table#:~:text=A%20fact%20table%20holds%20the,fact%20table%20can%20be%20analyzed .
2.2.1. A fact table is the central table in a star schema of a data warehouse. A fact table stores quantitative information for analysis and is often denormalized.
2.2.2.1.1.1. The foreign keys column allows joins with dimension tables
2.2.2.1.1.2. The measures columns contain the data that is being analyzed.
2.2.3.1. Stores data about the ways in which the data in the fact table can be analyzed.
Link:
https://rivery.io/blog/etl-vs-elt/
3.2.1.1. ETL.
3.2.1.2.1. Extraction:
3.2.1.2.1.1. This first step involves copying data from the source system.
3.2.1.2.2. Loading:
3.2.1.2.2.1. During the loading step, the pipeline replicates data from the source into the target system, which might be a data warehouse or data lake.
3.2.1.2.3. Transformation:
3.2.1.2.3.1. Once the data is in the target system, organizations can run whatever transformations they need. Often organizations will transform raw data in different ways for use with different tools or business processes.
3.2.1.3. https://rivery.io/blog/etl-vs-elt/
3.3.1. ELT stands for "extract, load, and transform" — the processes a data pipeline uses to replicate data FROM a source system INTO a target system such as a cloud data warehouse.
3.3.2.1. ELT is a modern variation on the older process of extract, transform, and load (ETL), in which transformations take place before the data is loaded. Running transformations before the load phase results in a more complex data replication process.
4.1.1. https://www.sap.com/insights/what-is-data-modeling.html#:~:text=What%20are%20the%20types%20of,oriented%2C%20and%20multi%2Dvalue .
4.2.1. Relational:
4.2.1.1. Although “older” in approach, the most common database model still in use today is relational, which stores the data in fixed-format records and arranges data in tables with rows and columns. The most basic type of data model has two elements: measures and dimensions. Measures are numeric values, such as quantities and revenue, used in mathematical calculations like sum or average. Dimensions can be text or numeric. They are not used in
calculations and include descriptions or locations. The raw data is defined as a measure or a dimension. Other terminology used in relational database design includes “relations” (the table with rows and columns), “attributes” (columns), “tuples” (rows), and “domain” (set of values allowed in a column). While there are additional terms and structural requirements that define a relational database, the important factor is the relationships defined within that structure.
Common data elements (or keys) link tables and data sets together. Tables can also be related explicitly, like parent and child relationships including one-to-one, one-to-many, or many-to-many.
4.2.2. Dimensional
4.2.2.1. Less rigid and structured, the dimensional approach favors a contextual data structure that is more related to the business use or context. This database structure is optimized for online queries and data warehousing tools. Critical data elements, like a transaction quantity for example, are called “facts” and are accompanied by reference information called “dimensions,” be that product ID, unit price, or transaction date. A fact table is a primary table in a
dimensional model. Retrieval can be quick and efficient – with data for a specific type of activity stored together – but the lack of relationship links can complicate analytical retrieval and use of the data. Since the data structure is tied to the business function that produces and uses the data, combining data produced by dissimilar systems (in a data warehouse, for instance) can be problematic.
4.2.3.1. An E-R model represents a business data structure in graphical form containing boxes of various shapes to represent activities, functions, or “entities” and lines to represent associations, dependencies, or “relationships.” The E-R model is then used to create a relational database with each row representing an entity and the fields in that row contain attributes. As in all relational databases, “key” data elements are used to link tables together.
4.2.4.1. If we model the database using ER diagrams, we must convert them into the relational model, which can be implemented by one of the RDBMS languages such as SQL and MySQL.
4.2.4.1.1. Mapping Cardinality is always a constraint in the ER model, while the cardinality constraint cannot be defined in the Relational Model.
4.3.1. The three primary data model types are relational, dimensional, and entity-relationship (E-R). There are also several others that are not in general use, including hierarchical, network, object-oriented, and multi-value. The model type defines the logical structure – how the data is stored, logically – and therefore how it is stored, organized, and retrieved.
5.1.1. https://www.guru99.com/star-snowflake-data-warehousing.html
5.2. Detailed answer
5.2.1.1. Definition
5.2.1.1.1. Star Schema in data warehouse, in which the center of the star can have one fact table and a number of associated dimension tables. It is known as star schema as its structure resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema. It is also known as Star Join Schema and is optimized for querying large data sets.
5.2.1.2. Example
5.2.1.2.1. In the following Star Schema example, the fact table is at the center which contains keys to every dimension table like Dealer_ID, Model ID, Date_ID, Product_ID Branch_ID & other attributes like Units sold and revenue.
5.2.1.3.1. Every dimension in a star schema is represented with the only one-dimension table.
5.2.1.3.3. The dimension table is joined to the fact table using a foreign key
5.2.1.3.6. The Star schema is easy to understand and provides optimal disk usage.
5.2.1.3.7. The dimension tables are not normalized. For instance, in the above figure, Country_ID does not have Country lookup table as an OLTP design would have.
5.2.2.1. Definition
5.2.2.1.1. Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. The dimension tables are normalized which splits data into additional tables.
5.2.2.2. Example
5.2.2.2.1. In the following Snowflake Schema example, Country is further normalized into an individual table.
5.2.2.3.1. The main benefit of the snowflake schema it uses smaller disk space. Easier to implement a dimension is added to the Schema Due to multiple tables query performance is reduced The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables
5.3.1.1. تطوير منMODEL جدول واحد الى جداول منفصلة مترتبة على بعض عشان اعرف انالسيس يعنى بعملDIMENSIONAL DATA MODEL
6. What is ANOMILIES
6.1. Database anomaly is normally the flaw in databases which occurs because of poor planning and storing everything in a flat database. Generally this is removed by the process of normalization which is performed by splitting/joining of tables.
7.1.1. https://docs.oracle.com/cd/E11882_01/server.112/e25555/tdpdw_refresh.htm#TDPDW00275
7.3.1. yes You must update your data warehouse on a regular basis to ensure that the information derived from it is current. The process of updating the data is called the refresh process, and this chapter describes the following topics:
8. § What is the difference between the database and the data warehouse?
8.1. Reference
9.1.1. binary
9.1.2. ternary
9.1.3. recursive
9.2. cardinality
12.3. Reference
15.2.1.1. Subject-oriented.
15.2.1.1.1. They can analyze data about a particular subject or functional area (such as sales).
15.2.1.2. Integrated.
15.2.1.2.1. Data warehouses create consistency among different data types from disparate sources.
15.2.1.3. Nonvolatile.
15.2.1.3.1. Once data is in a data warehouse, it’s stable and doesn’t change.
15.2.1.4. Time-variant.
16. § What is the difference between a data warehouse and big data?
16.1. Reference
16.2.1. Big data is any kind of data source that has at least one of four shared characteristics, called the four Vs:
17. § What are the processes that can be done in the data warehouse?
17.1. Reference
17.2. Detailed answer
18.1.1.1. queries
18.1.1.2. analysis
18.3. The data within a data warehouse is usually derived from a wide range of sources
18.3.1. such as
18.4. Over time, it builds a historical record that can be invaluable to data scientists and business analysts. Because of these capabilities, a data warehouse can be considered an organization’s “single source of truth.”
18.5.1. centralizes
18.5.2. consolidates
18.5.3. allow organizations to derive valuable business insights from their data to improve decision-making.
18.6.2. An extraction, loading, and transformation (ELT) solution for preparing the data for analysis
18.6.4. Client analysis tools for visualizing and presenting data to business users
18.6.5. Other, more sophisticated analytical applications that generate actionable information by applying data science and artificial intelligence (AI) algorithms, or graph and spatial features that enable more kinds of analysis of data at scale
18.8.1. Subject-oriented.
18.8.1.1. They can analyze data about a particular subject or functional area (such as sales).
18.8.2. Integrated.
18.8.2.1. Data warehouses create consistency among different data types from disparate sources.
18.8.3. Nonvolatile.
18.8.3.1. Once data is in a data warehouse, it’s stable and doesn’t change.
18.8.4. Time-variant.
18.9.1. Simple.
18.9.1.1. All data warehouses share a basic design in which metadata, summary data, and raw data are stored within the central repository of the warehouse. The repository is fed by data sources on one end and accessed by end users for analysis, reporting, and mining on the other end.
18.9.2.1. Operational data must be cleaned and processed before being put in the warehouse. Although this can be done programmatically, many data warehouses add a staging area for data before it enters the warehouse, to simplify data preparation.
18.9.3.1. . Adding data marts between the central repository and end users allows an organization to customize its data warehouse to serve various lines of business. When the data is ready for use, it is moved to the appropriate data mart.
18.9.4. Sandboxes.
18.9.4.1. Sandboxes are private, secure, safe areas that allow companies to quickly and informally explore new datasets or ways of analyzing data without having to conform to or comply with the formal rules and protocol of the data warehouse.
18.10. Reference
18.10.1. https://www.guru99.com/data-warehousing.html
19. GLOSSRY
19.1. access path
20.1.1. https://www.geeksforgeeks.org/data-cube-or-olap-approach-in-data-mining/
20.1.2. https://www.javatpoint.com/data-warehouse-what-is-data-cube
20.2.2.1. A data cube is created from a subset of attributes in the database. Specific attributes are chosen to be measure attributes, i.e., the attributes whose values are of interest.
20.2.3. wanas
20.2.3.1.1. Data cubes could be sparse in many cases because not every cell in each dimension may have corresponding data in the database.
20.3.1. Grouping of data in a multidimensional matrix is called data cubes. In Dataware housing, we generally deal with various multidimensional data models as the data will be represented by multiple dimensions and multiple attributes. This multidimensional data is represented in the data cube as the cube represents a high-dimensional space. The Data cube pictorially shows how different attributes of data are arranged in the data model. Below is the diagram of a
general data cube.
21.1.1. https://www.oracle.com/eg/autonomous-database/what-is-data-mart/
21.2.1. A data mart is a simple form of data warehouse focused on a single subject or line of business. With a data mart, teams can access data and gain insights faster, because they don’t have to spend time searching within a more complex data warehouse or manually aggregating data from different sources.