You are on page 1of 8

DATA

WAREHOUSING

GROUP 2
COLLINS KIPNGETICH – 659400
MERCY IMALI – 637025

MARGARET MUKUHI - 661931


DATA INTEGRATION?

• Data integration is the process of combining data from different sources and making it
available for analysis in a single, unified view. In the context of a data warehouse, data
integration is a critical step in creating a complete, consistent, and accurate view of an
organization's data.
• Without data integration, a data warehouse may contain incomplete or inconsistent data,
which can lead to inaccurate analysis and decision-making
DATA INTEGRATION PROCESS – ETL PROCESS

• Data Extraction: The first step in the data integration process is to extract data from various sources such as
transactional databases, external data sources, and other data warehouses or data marts.. For example, an organization
may extract data from a transactional database, such as sales data, customer data, or financial data. They may also extract
data from external sources, such as social media feeds, weather data, or industry reports.
• Data Transformation: Once the data has been extracted, it needs to be transformed into a consistent format. This involves
cleaning and standardizing the data, as well as resolving any inconsistencies or errors. For example, the sales data may
need to be standardized to ensure that it is in the same format across all regions or countries. Customer data may need to be
cleaned to remove any duplicate records, while financial data may need to be reconciled to ensure that it is accurate.
• Data Loading: Once the data has been transformed, it can be loaded into the data warehouse or data mart. This involves
mapping the data to the appropriate tables and fields in the warehouse, and ensuring that it is stored in a way that
makes it easily accessible for analysis. For example, sales data may be loaded into a sales fact table, while customer data
may be loaded into a customer dimension table.
REAL-TIME DATA WAREHOUSING

• Real-time data warehousing is an approach to data warehousing that enables organizations to


capture, process, and analyze data in real-time, allowing for up-to-the-minute insights into
their operations. Here are some examples of real-time data warehousing in action:
• Fraud detection: Real-time data warehousing can be used in fraud detection applications to
identify fraudulent transactions as they occur. For example, a credit card company may
use real-time data warehousing to monitor credit card transactions in real-time, using
advanced analytics algorithms to identify suspicious transactions that deviate from a
customer's normal spending patterns. By identifying and stopping fraudulent transactions in
real-time, the credit card company can minimize losses and improve customer satisfaction.
DATA WAREHOUSE ADMINISTRATION &
SECURITY?
• Data warehouse administration: Involves the management and maintenance of the
data warehouse, including activities such as data loading, backup and recovery,
performance tuning, and user management.
• Data warehouse security: Involves protecting the data warehouse from unauthorized
access, ensuring data privacy, and maintaining data integrity.
DATA WAREHOUSE SECURITY ISSUES

• Data Quality: Data quality issues can arise when data is sourced from multiple systems and may not
adhere to consistent standards. For example, a data warehouse that integrates data from various
healthcare systems may need to ensure that the patient data is accurate and consistent across all
systems. Data quality can be addressed through data cleansing and validation techniques, such as
data profiling and data matching.
• Data Access Control: Access control issues can arise when users are granted access to data they
shouldn't have access to or when data is accessed without proper authorization. For example, a data
warehouse that contains financial data may need to ensure that only authorized users can access
sensitive financial information. Data access control can be addressed through user authentication,
authorization, and encryption techniques.
DATA WAREHOUSE SECURITY ISSUES

• Data Backup and Recovery: Data backup and recovery issues can arise when data is not backed up frequently
enough or when backups are not properly tested or stored. For example, a data warehouse that contains critical
customer data may need to ensure that backups are performed regularly and that backup data is stored in a secure,
offsite location can be addressed through data backup and recovery procedures and techniques, such as
incremental backups and disaster recovery planning.
• Performance Tuning: Ensuring that the data warehouse performs well and meets user expectations is another
critical aspect of data warehouse administration. Performance tuning issues can arise when the data warehouse is
not optimized for the specific needs of the users or when the underlying hardware and software components are
not properly configured. For example, a data warehouse that supports complex queries may need to ensure that
the underlying database engine is optimized for performance. Performance tuning can be addressed through
database optimization techniques, such as index optimization and query optimization.
REFERENCES

• Han, J., Kamber, M., & Pei, J. (2014). Data mining: concepts and techniques. Elsevier.
• Inmon, W. H., & Kimball, R. (2002). The data warehouse ETL toolkit: Practical
techniques for extracting, cleaning, conforming, and delivering data. Wiley.
• Inmon, W. H., & Linstedt, D. (2014). Data architecture: A primer for the data scientist:
Big data, data warehouse and data vault. Morgan Kaufmann.
• Kimball, R., & Ross, M. (2013). The Data Warehouse Toolkit: The Definitive Guide to
Dimensional Modeling. John Wiley & Sons.

You might also like