You are on page 1of 23

A data warehouse is a large, centralized repository of data that is specifically designed to support

business intelligence (BI) activities, such as reporting, analytics, and data mining. It typically
contains a wide variety of data from different sources, such as transactional systems, operational
databases, and external data sources. The data in a warehouse is usually organized in a way that
makes it easy to access, manage, and analyze, and it is typically stored separately from the
operational systems that are used to run an organization's day-to-day operations.
The data warehouse extracts or obtains its data from operational databases as well as from
external sources, providing a more comprehensive data pool.
Ways to analyze and present decision support data were developed.
• Online analytical processing (OLAP) provides advanced data analysis and presentation
tools (including multidimensional data analysis).
• Data mining employs advanced statistical tools to analyze the wealth of data now
available through data warehouses and other sources and to identify possible
relationships and anomalies.
Business Drivers for DW and Bl
• Single Version of the Truth
• Current and Accurate Information
• Rapidly Changing Information Needs
• Customer Service Excellence
• New Service Delivery Channels

Single Version of the Truth

Fragmented, inconsistent and outdated data in multiple databases does not permit good strategic
and tactical decision making.

Companies require that business intelligence be consolidated and presented in a suitable format
for decision making.  Inconsistent information from disparate information systems is no longer
acceptable.

Data Warehouses help companies to achieve a single version of the truth by consolidating the
most accurate and current data from the most reliable systems.

Current and Accurate Information

In a highly competitive market place, businesses need to quickly identify problems and
opportunities in order to respond to events expeditiously and appropriately.
Up-to-date information on sales, profits, inventories and customers can help identify problems
early and leverage opportunities that could otherwise be missed.

Most application systems are too narrowly scoped and operate on cycles that don't support real-
time or near real-time information access.  A data warehouse, however, can be designed to
deliver up-to-date accurate information to decision makers.

Rapidly Changing Information Needs

It is very difficult for businesses to anticipate future information needs. Application systems


often seem rigid and unable to adapt to evolving management information needs.

Businesses need the flexibility to slice and dice data in many ways in order to identify and
analyze changes in the market place or in the business itself.

Data Warehouses are designed for online, analytical purposes and provide great flexibility.

Customer Service Excellence

It is often said that 10% of a business's customers account for 90% of the business's profits. 
Identifying the good customers and providing them with excellent service helps retain good
customers.

Data Warehousing can help identify a company's best customers using a any number or criterion.

New Service Delivery Channels

It is no longer sufficient to provide customers with just 9:00 AM to 5:00 PM in-store service. 
Customers want to do business 7 days a week, 24 hours per day using alternate service delivery
channels such as via the  Internet or telephone.

By examining all customer transactions, regardless of the channel used, businesses can better
understand their customers and serve them better.

Data Warehousing is critical for profiling customers and their transactions, regardless of the
channel used.

Technical Drivers for DW and Bl

Multiple Internal Databases

Most medium and large businesses operate dozens, if not hundreds of un-integrated application
systems.  Individual departments in companies often focus on their own narrow system and
information needs and don’t see the corporate value of integrating data.
When silos of un-integrated data exist, data soon gets out of synch.  Companies have a need for
database that reflects a "single version of truth".  Data Warehouses can help do that.

Purchased Packages

“Out of the Box” purchased applications sometimes use underlying concepts and definitions that
differ from those used by the business in existing custom built applications.

For example, a “customer” in one system could encompass all current and past customers plus
potential future customers.  In another system, a customer might be defined more narrowly as
someone who has purchased a product and service during the past 12 months.

Such inconsistencies create problems from an analytical perspective.  A count of customers done
in the first database differ from a count done in the second.

Companies have a need to align concepts and terminology.  Data Warehouses help do this
alignment.

Increasing Complexity of Systems

The underlying data structures of application systems are often very complicated.  To create
what would intuitively might appear to be a simple query often requires complex programming
logic that involves navigating multiple database tables and or applications systems.

Writing reports or queries can consequently take time and money.

Companies have a need for a reporting environment that allows reports and queries to be
generated quickly, inexpensively and without expensive IT skills.  Data Warehouses can simplify
the reporting environment.

Application System Evolution

Businesses are highly dynamic and applications systems are constantly needing to be enhanced
to support new business requirements.

When systems are changed, reports and queries that access any changed tables must also be
updated.  This maintenance work can be very costly.

Businesses have a need to trim their application support costs.  Data Warehouses can help shelter
reports and queries from system changes that occur in "front end" operational systems.

Computer Networks and External Databases

The rapid growth of computer networks has allowed companies to exchange data with their
suppliers, consumers, government bodies and other groups.
Businesses often have a need to integrate data from internal and external databases.  Data
Warehouse can be designed to to integrate corporate data with external data for reporting
purposes.

Business Intelligence (BI) refers to a set of methods and techniques that are used by
organizations for tactical and strategic decision making. It leverages technologies that focus on
counts, statistics and business objectives to improve business performance.

A Data Warehouse (DW) is simply a consolidation of data from a variety of sources that is
designed to support strategic and tactical decision making.  Its main purpose is to provide a
coherent picture of the business at a point in time.

What Is Business Intelligence?


• Business Intelligence (Bl) is
• the new technology for understanding the past & predicting the future ...
• a broad category of technologies that allows for
• gathering, storing, accessing & analyzing data to help business users make better
decisions
• analyzing business performance through data-driven insight
• A broad category of applications, which include the activities of
• decision support systems
• query and reporting
• online analytical processing (OLAP)
• statistical analysis, forecasting, and data mining.
Bl applications can be:
• mission-critical and integral to an enterprise's operations or occasional to meet a
special requirement
• enterprise-wide or local to one division, department, or project
• centrally initiated or driven by user demand

One simple Bl example


A Business Intelligence (BI) framework is a set of processes, technologies, and tools that
organizations use to collect, store, access, and analyze data to support informed decision-making.
A BI framework typically includes the following components:
 Data Warehousing: A system for collecting, storing, and managing large amounts of data
from various sources, such as transactional systems, external sources, and other data
warehouses.
 Data Integration: The process of combining data from different sources into a single,
consistent view. This can involve cleaning, transforming, and normalizing data to ensure
compatibility and consistency across different systems.
 Data Analysis: The use of various techniques, such as SQL queries, OLAP, and data
mining, to extract insights and knowledge from the data.
 Reporting and Visualization: The creation and presentation of reports, dashboards, and
other visual aids that allow users to quickly understand and communicate key findings
and insights.
 Performance Management: The use of BI data and analytics to track and monitor
organizational performance and to identify areas for improvement.
 Collaboration and Sharing: The ability for multiple users to access, analyze, and share
data and insights, and to collaborate on analysis and decision-making.
 Data Governance: The set of policies, procedures, and standards for managing data
quality, security, privacy, and compliance.
Sample of Business Intelligence Tools
• Spreadsheets - These interactive computer applications manage information in a visual
format, organized by rows and columns.
• Decision Support Systems - arrangement of computerized tools used to assist
managerial decision making within a business.
• Dashboards and business activity monitoring - dashboards use Web-based
technologies to present key business performance indicators or information in a single
integrated view, generally using graphics in a clear, concise, and easy to understand
manner.
• Portals - provide a unified, single point of entry for information distribution. Portals are a
Web-based technology that uses a Web browser to integrate data from multiple sources
into a single Web page. Many different types of Bl functionality can be accessed through
a portal.
• Data analysis and reporting tools - Advanced tools used to query multiple diverse data
sources to create single integrated reports.
• Data-mining tools - Tools that provide advanced statistical analysis to uncover problems
and opportunities hidden within business data.
• Data warehouses (DW) - the foundation on which a Bl infrastructure is built. Data is
captured from the OLTP system and placed in the DW on near-real-time basis. Bl
provides company-wide integration of data and the capability to respond to business
issues in a timely manner
• OLAP tools - Online analytical processing provides multidimensional data analysis.
• Data visualization - Tools that provide advanced visual analysis and techniques to
enhance understanding of business data.
H
Operational Data Vs. Decision Support Data
• Operational data storage is optimized to support transactions that represent daily
operations
• Decision Support Data is optimized to support decision making.
POINT Operational Data Decision Support
Data

Time span Operational data cover a • Decision support


short time frame. data tend to cover a
longer time frame

Granularity (level of • Decision support


aggregation) data must be
presented at different
levels of aggregation,
from highly
summarized to near-
atomic.

Dimensionality. Operational data focus on •
representing individual
transactions rather than
on the effects of the
transactions over time.

Operational Data Vs. Decision Support Data GIVE 10 POINTS


 Operational data is data that is used to support the day-to-day operations of an
organization, such as managing transactions and maintaining records. Decision support
data, on the other hand, is data that is used to inform strategic decision-making and
planning.
 Operational data is typically generated and stored in real-time, whereas decision support
data is often stored in a data warehouse or other form of data repository for later use.
 Operational data is typically focused on the details of individual transactions and events,
while decision support data is more focused on trends and patterns that can inform
business decisions.
 Operational data is used to support specific business processes, such as customer service
or inventory management, while decision support data is used to support broader strategic
decision-making across the organization.
 Operational data is more voluminous and come from many sources with different format
and structure, while decision support data is more structured and come from fewer
sources.
 Operational data is more transactional by nature, while decision support data is more
analytical by nature
 Operational data is typically used by front-line employees, such as customer service
representatives, while decision support data is used by managers and executives.
 Operational data is important for the real-time management of an organization, while
decision support data is important for long-term planning and strategy development
 Operational data is often stored in operational databases that are optimized for
transactional processing, while decision support data is often stored in data warehouses or
data marts that are optimized for analytical processing.
 Operational data can be used to validate the results obtained from decision support data,
while decision support data is used as a base for taking important decisions.

OPERATIONAL DECISION SUPPORT


CHARACTERISTIC DATA DATA

Data currency Current operations Historic data


Real-time data
Snapshot of company data
Time component
(week/month/year)

Granularity Atomic-detailed Summarized data


data

Summarization level Low; some High; many aggregation


aggregate yields levels

Data model Highly normalized Non-normalized Complex


Mostly relational structures
DBMS
Some relational, but
mostly multidimensional
DBMS

Transaction type Mostly updates Mostly query

Transaction volumes High update Periodic loads and


volumes summary calculations

Transaction speed Updates are critical Retrievals are critical

Query activity Low-to-medium High

Query scope Narrow range Broad range

Query complexity Simple-to-medium Very complex

Data volumes Hundreds of Terabytes to petabytes


gigabytes

Basic Bl Architectural Components


 ETL tools -Data extraction, transformation, and loading (ETL) tools collect, filter,
integrate, and aggregate operational data to be saved into a data store optimized for
decision support.
 Data Store - The data store is optimized for decision support and is generally
represented by a data warehouse or a data mart.
The data store contains two main types of data:
• Business data - are extracted from the operational database and from external data
sources and is stored in structures that are optimized for data analysis and query
speed. The external data sources provide data that cannot be found within the company
but that are relevant to the business, such as stock prices, market indicators, marketing
information (such as demographics), and competitors' data.
• Business model data - are generated by special algorithms that model the business to
identify and enhance the understanding of business situations and problems.
 Data query and analysis tools - performs data retrieval, data analysis, and data-
mining tasks using the data in the data store. This component is used by the data
analyst to create the queries that access the database. Depending on the
implementation, the query tool accesses either the operational database, or more
commonly, the data store. This component is generally represented in the form of an
OLAP tool.
 Data presentation and visualization tools - is in charge of presenting the data to the
end user in a variety of ways.
This component is used by the data analyst to organize and present the data. This tool helps the
end user select the most appropriate presentation format, such as summary report, map, pie or
bar graph, or mixed graphs. The query tool and the presentation tool are the front end to the Bl
environment.

A decision support database is a specialized DBMS tailored to provide fast answers to complex
queries
There are four main requirements for a decision support database:
• Database schema, data extraction and loading, end-user analytical interface, and
database size.
 Database schema
• must support complex, non-normalized data representations
• data must be aggregated and summarized
• queries must be able to extract multidimensional time slices
 Data extraction and loading
• database is created largely by extracting data from the operational database and by
importing additional data from external sources.
• data extraction capabilities should allow batch and scheduled data extraction.
• data extraction capabilities should also support different data sources: flat files and
hierarchical, network, and relational databases, as well as multiple vendors
• data-filtering capabilities must include the ability to check for inconsistent data or data
validation rules and resolve them
• to filter and integrate the operational data into the decision support database, the DBMS
must support advanced data integration, aggregation, and classification.
 End-user analytical interface
• should permit the user to navigate through the data to simplify and accelerate the
decision-making process.
• the decision support DBMS must generate the necessary queries to retrieve the
appropriate data from the decision support database.
• because queries yield crucial information for decision makers, the queries must be
optimized for speedy processing.
 Database size should support
• very large databases (VLDBs)
• Advanced storage technologies
• multiple-processor technologies
Business Intelligence Vs Artificial Intelligence
• Al systems make decisions for the users.
• Bl systems help the users make the right decisions, based on available data.
J
Differences between Business Intelligence Vs Artificial Intelligence. give 13 points
 Business Intelligence (BI) is primarily focused on the analysis and reporting of historical
data, while Artificial Intelligence (AI) is focused on the development of systems that can
perform tasks that would typically require human intelligence, such as decision-making
and problem-solving.
 BI tools are commonly used for data visualization, data mining, and querying, while AI
methods include machine learning, natural language processing, and image and speech
recognition.
 BI is mainly used to support and inform business operations and strategic decision-
making, while AI is used to automate tasks, improve efficiency and gain new insights.
 Business Intelligence mainly focuses on structured data, such as financial reports and
sales data, while AI deals with both structured and unstructured data, such as images,
videos, and natural language text.
 BI is mostly used for historical data analysis and predictions while AI is used to analyze
real-time data and make predictions in near real-time.
 Business Intelligence systems are mainly used by business leaders and management
teams, while AI systems can be used by different departments and functions throughout
an organization.
 BI systems are designed to work with a wide range of data sources, while AI systems
typically require more specific, high-quality data sets to train and improve their
performance.
 Business Intelligence mainly focus on providing insights while AI also use them but
focus on automating the decision-making process.

 BI focus on large data-sets while AI can work with small data sets too.
 BI systems are mainly focused on making data easy to understand and actionable, while
AI systems are focused on creating intelligent systems that can improve over time.
 BI solutions typically have a long implementation time frame, AI solutions can be
implemented and trained relatively quickly depending on the use case.
 BI solutions require a lesser level of technical expertise to use and maintain, compared to
AI solutions, which often require specialized knowledge in areas such as machine
learning and data science.
 Business Intelligence systems are considered as more established and mature technology,
compared to AI which still in developing stage, but actively researched and developed.
Business Intelligence and the Web
The Web makes Bl even more useful
• Customers do not appear “physically” in a store; their behaviors cannot be observed by
traditional methods
• A website log is used to capture the behavior of each customer, e.g., sequence of pages
seen by a customer, the products viewed
• Idea: understand your customers using data and Bl!
• Utilize website logs, analyze customer behavior in more detail than before (e.g., what
was not bought?)
• Combine web data with traditional customer data
H
• Data warehousing: The process of constructing and using data warehouses
• Main Definition
• Data warehouse is an integrated, subject-oriented, time-variant, nonvolatile
collection of data that provides support for decision making.
Definition of components
Integrated
• is a centralized, consolidated database that integrates data derived from the entire
organization and from multiple sources with diverse formats.
• data integration implies that all business entities, data elements, data characteristics, and
business metrics are described in the same way throughout the enterprise.
Subject-oriented
• data warehouse data are organized and summarized by topic, such as sales, marketing,
finance, distribution, and transportation.
• for each topic, the data warehouse contains specific subjects of interest-products,
customers, departments, regions, promotions, and so on.
Time-variant
• warehouse data represent the flow of data through time.
• the data warehouse contains a time ID that is used to generate summaries and
aggregations by week, month, quarter, year, and so on.
Nonvolatile
• once data enter the data warehouse, they are never removed.
• because data are never deleted and new data are continually added, the data warehouse is
always growing.
• that's why the DBMS must be able to support multigigabyte and even multiterabyte or
greater databases, operating on multiprocessor hardware.
G
Data warehouses Vs Operational DBs
Online transaction processing (OLTP)
• Is the major task of traditional relational DBMS.
• Covers day-to-day operations: purchasing, inventory, banking, manufacturing, payroll,
registration, accounting, etc.
• Aims at reliable and efficient processing of a large number of transactions and ensuring
data consistency.
On-line analytical processing (OLAP)
• Major task of data warehouse system
• Deals with data analysis and decision making
• Aims at efficient multidimensional processing of large data volumes, i.e., Fast, interactive
answers to large aggregate queries

F
Distinguishing features between OLTP and OLAP are summarized as follows:
Users and system orientation:
• OLTP system is customer-oriented and is used for transaction and query processing by
clerks, clients, and information technology professionals.
• OLAP system is market-oriented and is used for data analysis by knowledge workers,
including managers, executives, and analysts.
Data contents:
• OLTP system manages current data that, typically, are too detailed to be easily used for
decision making.
• OLAP system manages large amounts of historical data, provides facilities for
summarization and aggregation, and stores and manages information at different levels of
granularity. These features make the data easier to use in informed decision making.
Database design:
• OLTP system usually adopts an entity-relationship (ER) data model and an application-
oriented database design.
• OLAP system typically adopts either a star or snowflake model and a subject-oriented
database design.
View:
• OLTP system focuses mainly on the current data within an enterprise or department,
without referring to historical data or data in different organizations.
• OLAP system often spans multiple versions of a database schema, due to the
evolutionary process of an organization. OLAP systems also deal with information that
originates from different organizations, integrating information from many data stores.
Because of their huge volume, OLAP data are stored on multiple storage media.
Access patterns:
• The access patterns of an OLTP system consist mainly of short, atomic transactions. Such
a system requires concurrency control and recovery mechanisms.
• However, accesses to OLAP systems are mostly read-only operations (because most data
warehouses store historical rather than up-to-date information), although many could be
complex queries.
Comparison between OLTP and OLAP systems

Table 4.1: Comparison between OLTP and


OLAP systems.

Feature OLTP OLAP


Characteristic operational processing informational processing

Orientation transaction analysis

User clerk, DBA, database knowledge worker (e.g.,


professional manager, executive,
analyst)

Function day-to-day operations long-term informational


requirements, decision
support

ER based, application- star/snowflake, subject-


DB design oriented oriented

Data current; guaranteed up- historical; accuracy


to-date maintained over time

Summarization primitive, highly detailed summarized, consolidated

View detailed, flat relational summarized,


multidimensional

Unit of work short, simple transaction complex query

Access read/write mostly read

Focus data in information out

index/hash on primary
Operations key lots of scans

Number of records

accessed tens millions

Number of
users thousands hundreds

DB size GB to high-order GB > TB

high performance, high high flexibility, end-user


Priority availability autonomy

Metric transaction throughput query throughput,


response time
Differences between Data warehouses Vs Operational DBs, give 15 points
 Data warehouses are designed for reporting and analysis, while operational databases are
designed for transaction processing.
 Data warehouses typically store historical data, while operational databases store current
data.
 Data warehouses use a schema-on-write model, while operational databases use a
schema-on-read model.
 Data warehouses are optimized for reading, while operational databases are optimized for
writing.
 Data warehousing involves a process of extracting, cleaning, and transforming data from
various sources, while operational databases do not typically require this level of data
preparation.
 Data warehouses are often built using columnar storage, while operational databases use
row-based storage.
 Data warehouses use aggregate tables to support faster query performance, while
operational databases do not.
 Data warehouses are designed to support complex queries, while operational databases
are designed to support simple, fast queries.
 Data warehouses are typically updated in batches, while operational databases are
updated in real-time.
 Data warehouse systems usually use a different DBMS than operational databases.
 Data Warehouses have a different security model and auditing requirements than
operational databases
 Operational databases are optimized for data concurrency and consistency, data
warehouses are optimized for data historical analysis.
 Data warehouse typically store data in a denormalized format, while operational
databases store data in a normalized format.
 Data warehouse is a centralized repository, while Operational DB may distributed in
nature.
 Data warehouses are often used in a business intelligence context, while operational
databases are used in a transactional context.
Data Warehouse Models
Three data warehouse models:
• the enterprise warehouse
• the data mart
• the virtual warehouse
Enterprise warehouse
• An enterprise warehouse collects all the information about subjects (customers, products,
sales, assets, personnel) spanning an entire organization
• It provides us enterprise-wide data integration.
• The data is integrated from operational systems and external information providers.
• This information can vary from a few gigabytes to hundreds of gigabytes, terabytes or
beyond.
• It requires extensive business modeling and may take years to design and build.
Data mart
• Data mart contains a subset of organization-wide data. This subset of data is valuable to
specific groups of an organization.
• In other words, we can claim that data marts contain data specific to a particular group.
For example, the marketing data mart may contain data related to items, customers, and
sales. Data marts are confined to subjects.
Data marts are usually implemented on low-cost departmental servers that are Unix/Linux- or
Windows-based. The implementation cycle of a data mart is more likely to be measured in weeks
rather than months or years. However, it may involve complex integration in the long run if its
design and planning were not enterprise-wide
Virtual warehouse
• A virtual data warehouse is a set of separate databases, which can be queried together, so
a user can effectively access all the data as if it was stored in one data warehouse.
• A virtual warehouse is a set of views over operational databases.
• A virtual warehouse is easy to build but requires excess capacity on operational database
servers.
F
Metadata Repository
Meta data is the data defining warehouse objects. It has the following kinds
• Description of the structure of the warehouse
• schema, view, dimensions, hierarchies, derived data definition, data mart
locations and contents
• Operational meta-data
• data lineage (history of migrated data and transformation path), currency of data
(active, archived, or deleted), monitoring information (warehouse usage statistics,
error reports, audit trails)
• The algorithms used for summarization
• The mapping from operational environment to the data warehouse
• Data related to system performance - e.g., warehouse schema, view and derived data
definitions
• Business data, e.g., business terms and definitions, ownership of data, charging policies
Role of Metadata
Metadata are used as a directory to help the decision support system analyst locate the contents
of the data warehouse
• as a guide to the mapping of data when the data are transformed from the operational
environment to the data warehouse environment
• as a guide to the algorithms used for summarization between the current detailed data and
the lightly summarized data, and between the lightly summarized data and the highly
summarized data.
Metadata should be stored and managed persistently (i.e., on disk).
The ETL process
• process of extracting, transforming, and loading the
aggregated data into the data warehouse is known as ETL.
• ETL technology can migrate data from different types of data structures (e.g. databases
and at files which may include the following:
• Customer Relationship Management (CRM)
• Enterprise Resource Planning (ERP)
• Legacy Systems a E-Commerce
• Supply Chain Management (SCM) and across different platforms (e.g. mainframe,
server).
• the extracted data is converted from its previous form into a general form it needs to be in
so that it can be loaded into the data warehouse.
Why Have a Separate Data Warehouse? give 15 points
 Data warehousing allows for the efficient storage and retrieval of large amounts of
historical data, which can be useful for long-term analytics and reporting.
 A separate data warehouse allows for a clear separation of operational and analytical
systems, which can improve performance and reduce the risk of conflicting changes.
 Data warehousing allows for the integration of data from multiple sources, which can
provide a more comprehensive view of the organization's data.
 Data warehousing allows for the use of specialized tools and technologies that are
optimized for analytical processing, such as OLAP and data mining.
 Data warehousing allows for the denormalization of data, which can improve query
performance by reducing the number of joins required.
 Data warehousing allows for the use of star and snowflake schemas, which can simplify
querying and improve performance by preaggregating data.
 Data warehousing allows for the implementation of data governance and security
controls, which can be more difficult in operational systems.
 Data warehousing allows for the creation of data marts, which can provide more targeted
and specific views of the data for different departments or business units.
 Data warehousing allows for the use of data quality, cleaning, and transformation tools to
improve the accuracy and consistency of the data.
 Data warehousing allows for the use of data lineage and provenance tools to trace the
origin and history of the data.
 Data warehousing allows for the use of data archiving and purging tools to manage the
retention and disposal of historical data.
 Data warehousing allows for the use of incremental loading and change data capture
(CDC) techniques to improve the efficiency of data loading.
 Data warehousing allows for more advanced analytics such as trend analysis, forecasting,
and predictive modeling
 Data Warehousing can provide Business-ready data to the end-users for quick insights
and improve decision-making process
 Data Warehousing enables the use of big data and advanced analytics technologies such
as Spark, Hadoop, or machine learning to extract insights from large datasets
Explain web portals and dashboards
A web portal is a website or service that brings information from diverse sources, such as emails,
databases, and applications, in a unified way. It serves as a point of access for an organization's
internal or external web-based applications and services. A portal can provide a personalized
view of information and applications based on an individual user's roles and permissions. Some
examples of web portals are Intranet portals, Extranet portals, and Internet portals.

A dashboard is a user interface that presents information in a visual and easy-to-understand


format. It is often used to display data from multiple sources in a single view. Dashboards can be
used in a wide range of applications, such as business intelligence, performance management,
and monitoring systems. They typically provide real-time updates, interactive visualizations, and
the ability to drill down into the underlying data. Examples of dashboards include business
intelligence dashboards, web analytics dashboards, and monitoring dashboards.
Role of web portals in business intelligence. give 12 points
 Centralized access to data: Web portals provide a centralized location for accessing and
analyzing data from multiple sources, such as databases, spreadsheets, and cloud-based
services.
 Personalization: Web portals can be customized to provide a personalized view of
information and applications based on an individual user's roles and permissions.
 Real-time updates: Dashboards within web portals can be configured to provide real-time
updates on key performance indicators (KPIs) and metrics.
 Interactive visualizations: Web portals can provide interactive charts, graphs, and other
visualizations to help users understand and analyze data.
 Self-service analytics: Web portals can enable users to perform their own data analysis
and create custom reports, without relying on IT or other technical staff.
 Mobile access: Web portals can be accessed on mobile devices, which allows users to
view and analyze data on-the-go.
 Collaboration: Web portals can provide tools for collaboration, such as shared
workspaces, to allow users to work on data and reports together.
 Alerts and notifications: Web portals can be configured to send alerts and notifications
when certain conditions or thresholds are met, so users can stay informed of important
changes in the data.
 Data governance: Web portals can provide features such as data security, access control,
and auditing to ensure data governance and compliance with regulations.
 Data integration: Web portals can provide a unified view of data from multiple sources,
by integrating data from different systems and applications.
 Predictive analytics: Web portals can also provide tools for predictive analytics, such as
machine learning models, that help users make predictions about future trends or patterns
in the data.
 Data governance: Web portals can also be integrated with data governance tools such as
data lineage , data cataloging, and metadata management to ensure that data is consistent,
accurate, and well-understood.
Role of dashboards in business intelligence. give 12 points
 Data visualization: Dashboards provide a visual representation of data, making it easy for
users to understand and analyze large amounts of information.
 Real-time updates: Dashboards can be configured to provide real-time updates on key
performance indicators (KPIs) and metrics, so users can stay informed of important
changes in the data.
 Interactive: Dashboards typically provide interactive features such as drill-down,
filtering, and zooming, which allows users to explore data in more detail.
 Personalization: Dashboards can be customized to display data that is relevant to the
individual user, based on their roles and responsibilities.
 Self-service analytics: Dashboards can empower users to perform their own data analysis,
without relying on IT or other technical staff.
 Mobile access: Dashboards can be accessed on mobile devices, which allows users to
view and analyze data on-the-go.
 Collaboration: Dashboards can be shared with other users, which allows for collaboration
and discussion of the data.
 Alerts and notifications: Dashboards can be configured to send alerts and notifications
when certain conditions or thresholds are met, so users can stay informed of important
changes in the data.
 Data integration: Dashboards can provide a unified view of data from multiple sources,
by integrating data from different systems and applications.
 Predictive analytics: Dashboards can include visualizations of predictive analytics
models, such as machine learning, that help users make predictions about future trends or
patterns in the data.
 Data governance: Dashboards can also be integrated with data governance tools such as
data lineage, data cataloging, and metadata management to ensure that data is consistent,
accurate, and well-understood.
 Dashboards can also be used to enable data-driven decision making by providing insights
and identifying areas of opportunity, risk and performance enabling business to take
actions accordingly.
Describe the ETL process in data warehouse
ETL stands for Extract, Transform, and Load, and it is a process used to move data from one or
more sources into a data warehouse for later analysis and reporting. The process can be broken
down into the following three steps:
 Extract: In this step, data is extracted from various sources such as databases, flat files, or
cloud-based services. The extracted data can be raw or transformed in some way,
depending on the source.
 Transform: In this step, the extracted data is transformed to make it consistent and
compliant with the data warehouse schema. This step includes processes such as data
cleaning, data validation, and data transformation. The transformed data is then loaded
into a staging area.
 Load: In this step, the transformed data is loaded into the data warehouse. This step also
involves indexing the data, creating and maintaining the data integrity, and other
technical details that ensures data is reliable and accessible for reporting and analysis
purposes.

Role of metadata in data warehouse and BI give 10 points


 Metadata is used to describe the structure and organization of data in a data warehouse,
including table and column names, data types, and relationships between tables.
 It helps users understand the meaning and context of the data they are working with.
 It helps to organize data into logical groups and hierarchies, making it easier to navigate
and query the data.
 Metadata can be used to define relationships between different data elements, such as
between a sales fact table and a product dimension table.
 It allows for better data governance by providing information on data lineage, data
quality, and data ownership.
 Metadata can be used to manage security and access controls, allowing only authorized
users to access certain data.
 it helps to improve data discovery, making it easier to identify relevant data sources for
business intelligence and analytics.
 It can be used to improve data integration and mapping, by providing information on how
data from different sources is related and how it should be combined.
 Metadata can be used for automated documentation of data sources, reducing the need for
manual documentation and increasing efficiency.
 it plays important role in maintaining data integrity, consistency, and accuracy across the
organization.

What is metadata
Metadata is data that describes other data. It provides information about a specific item, such as
the title, author, and creation date of a document, the duration of a video or audio file, or the
dimensions of an image. Metadata can also include information about how a file is structured,
such as the number of pages in a document or the number of tracks on an album. This
information is used by computer systems to organize, search, and retrieve files and by humans to
understand the content and context of a file.
Difference between OLTP and OLAP, give 15 points
 OLTP (Online Transaction Processing) is designed for recording transactions and
management of day-to-day operations, whereas OLAP (Online Analytical Processing) is
designed for complex queries and analysis of data.
 OLTP uses normalized data model and relational database management systems
(RDBMS) for data storage, whereas OLAP uses denormalized data model and
multidimensional data structures such as cubes for data storage.
 OLTP databases are usually optimized for fast data retrieval and data modification,
whereas OLAP databases are optimized for fast data analysis and reporting.

 OLTP systems are typically used for real-time data entry and retrieval, whereas OLAP
systems are used for decision support and data analysis.
 In OLTP, data is broken down into small and simple transactions, whereas in OLAP, data
is organized into complex multidimensional structures.
 OLTP systems typically handle a large number of short and simple transactions, whereas
OLAP systems handle a smaller number of complex and lengthy queries.
 OLTP is mostly used in small and medium-sized businesses, whereas OLAP is used in
large enterprises.
 OLTP is used to maintain data integrity and consistency, whereas OLAP is used for data
aggregation and data mining.
 Queries in OLTP are simple, and retrieval is based on specific record, whereas in OLAP
data is retrieved based on complex calculations and multidimensional analysis.
 In OLTP data is frequently updated, whereas in OLAP data is periodically updated in
batch process.
 OLTP systems are used to maintain accurate and current records, whereas OLAP systems
are used to identify trends and patterns.
 In OLTP, data is stored in a single table, whereas in OLAP data is stored in multiple
tables which are linked to each other through a defined schema.
 OLTP is used for operational reporting, whereas OLAP is used for strategic reporting.
 OLTP performs a high number of reads and writes, whereas OLAP performs a high
number of complex read operations.
 Data in OLTP is atomic and granular, whereas data in OLAP is consolidated and
aggregate.

You might also like