Professional Documents
Culture Documents
2-Marks
a) Business requirements
b) Business value
c) Program Management
d) Development
3) Implementation layer
Ans: A data warehouse is a type of data management system that is designed to enable and
support business intelligence (BI) activities, especially analytics.
1) casual users
2) power users
1) Casual users:
They are the consumers of information who use the pre-existing reports created by power
users and make decisions / take actions.
2) Power users:
They are the producers of information.They use powerful analytical and authoring tools to
access data from data warehouses/ data marts and other sources from inside and outside the
organization.
e)Data mining
b) Marketplace analysis
c) Performance analysis
d) Behavior analysis
f) Productivity analysis
Ans: a data mart holds data and aggregations about one single subject area/ domain which
can be used for analysis , reporting or decision support.
9. Define ODS.
Ans: ODS(Operational data store) processes the operational data that is fed into data
warehouse and provides a homogeneous unified view which can be used for analysis and
reporting.
10. Define ETL.
Ans: ETL, which stands for extract, transform and load, is a data integration process that
combines data from multiple data sources into a single, consistent data store that is loaded
into a data warehouse or other target system.
Ans: It is the degree to which the attributes of data, associated with a certain entity
accurately describes the occurrence of that entity.
• Ans: it helps decision makes to quickly access and query information based on key
variable to gain meaningful insights
better monitoring of key variables like trending patterns and customer behavior across
geographies which leads to reduced R&D costs
Ans: Data sources (transactional or operational, external) are the sources from which we
extract data.
Ans: It helps decision makes to quickly access and query information based on key variable
to gain meaningful insights
Better to monitoring of key variables like trending patterns and customer behavior across
geographies which leads to reduced R&D costs.
• Identify entities
• Review the ER diagram with business users and get their sign off.
5 or 10-Marks Questions
1. Explain briefly Business requirements.
Ans: It is a result of the 3 step process namely business drivers, business goals and business
strategies.
Business drivers are factors that initiate the need to act. Ex: changing labour
laws, changing economy, workforce, technology etc.
Business goals are the targets to be achieved in response to business drivers. Ex:
increased productivity, improved market share, profits, customer satisfaction, cost
reduction etc.
Business strategies are the plan of action to achieve the set goals. Ex:
outsourcing, global delivery model, customer and employee retention programs
etc.
Ans:
4. Explain with neat diagram Implementation layer of BI component framework.
Implementation layer:
• It consists of technical components required to capture, transform, clean and convert data
into meaningful information and deliver it to meet business goals and bring value to
business.
b) information services
• It is the process that prepares basic repository / data store from which data is extracted.
refer fig 5.6 for an example on data warehouse for “AllGoods” store..
b) Information services:
1) casual users
2) power users
1) Casual users:
• They are the consumers of information who use the pre-existing reports created by power
users and make decisions / take actions.
2) Power users:
• They are the producers of information
• They use powerful analytical and authoring tools to access data from data warehouses/
data marts and other sources from inside and outside the organization.
a) DSS (decisions support systems): help in decision making at operational and tactical
levels.
• OLAP tools allow slicing and dicing of data from various perspectives.
– The source of data is multiple heterogeneous internal and external data sources
iii) Front end tools : it supports the user/client by providing reporting, analysis and
data mining tools.
• It captures data about customer behavior and predicts customer buying pattern.
• Helps make decisions like direct marketing , CRM, customer loyalty & satisfaction.
b) Marketplace analysis:
c) Performance analysis:
d) Behavior analysis:
f) Productivity analysis:
• It includes collecting data, performing aggregations and comparing the actual result
against the estimated /planned.
• It helps to decide the best channel to reach out the products/ services for the
customers/consumers.
price decides on when to increase/ decrease the price placing decides on how to
reach the customers ( physical stores or online store)
promotion decides on personal selling , advertising also decides on which sales channel
should be discontinued
• the program team prepares the strategy on how the BI project will execute.
a) BI program manager:
• Plans and budgets the projects and follows up the progress of each project
b) BI data architect:
• Optimizes current data usage and takes care of future data needs( design and content)
c) BI ETL architect:
• He determines the best way to obtain data from different operational sources/platforms.
trains the ETL specialists on data acquisition, transformation and loading.
d) BI technical architect:
• Assesses current technical architecture and system capacity for long term processing
needs.
• Defines strategy for data back up and recovery and disaster recovery.
e) Metadata manager
• who accessed the application metadata, when and what is the frequency of access?
f) BI Administrator:
a) Business Manager:
b) BI business specialists:
• Ensures that the information is identified correctly at all levels and accessed at all modes.
c) BI project Manager:
• He leads the project and ensures delivery of all project needs and assesses risks.
• Documents requirements
• Designs training infrastructure & material and trains BI users and educates users on
warehousing capabilities
• Plans and executes acceptance tests and helps users find right information
f) BI Designer:
• He interprets the requirements and designs the data structure for optimal access,
performance and integration
g) ETL specialist:
h) Database administrator:
• Keeps check of the physical data appended to BI environment in current project cycle
• Create and optimize and administer physical tables, triggers and partitions
9. Explain the need for Data Warehouse with ETL process diagram.
• Ans: Data from several heterogeneous sources ( like spreadsheets, Access database,
.CSV files, etc) can be extracted and archived in a data warehouse. ( refer fig: 6.1)
• According to ralph kimball “ A data warehouse is made up of all the data marts in an
enterprise”
• It is a bottom up approach.
• The single version of truth might be compromised since several independent data marts
are likely to have multiple versions of same entity/data.
1) Information accessibility:
2) Information credibility:
• Data warehouse must adapt to changes like business situations, user requirements,
technology, access tools etc.
• Data should be relevant for more precise decision making and easily accessible to
business users
• There must be security mechanisms enabled so that the confidential data must be
accessible only to valid authorized users.
6) Information consistency:
• The information provided to the users should maintain single/consistent version of truth
1. Schema integration :
• Multiple data sources provide data on same entity type. Hence schema integration allows
applications a transparent view and ability to query the data as if it is from one uniform
data source.
• Consider a retail outlet that has two branches namely Branch A that stores transaction
data with following schema
• Branch B stores transaction data with following schema
• The schema from both branches is integrated by mapping the respective columns by
looking up the metadata information of schemas like column names, type, length,
constraints, domain of values, NULL, zero and blank values etc.
2) Instance integration:
• Here the information is directly derived from the data to get accurate semantic
information on data content.
• It identifies and integrates all instances of the data items that represent the real world
entity.
• Ex: Consider a corporate house with 10,000 employees. they do not have a ERP. Instead
they have various applications like “projectAllocate”, “employeeLeave”
,”employeeAttendance”, “employeePayroll”
Now the company wants to consolidate all the details of every employee.
There is a employee named “fred Aleck”. All the applications have stored his name in
different way.
ProjectAllocate
EmployeeLeave
EmployeeAttendance
EmployeePayroll
• One solution to consolidate the data in above tables is, to look up all the records using
the primary key (ex: employeeNo or SSN) and then replace the column with different
data values for same attributes with a consistent value (ex: Fred aleck).
• ProjectAllocate
EmployeeLeave
EmployeeAttendance
EmployeePayroll
2) Object brokering
3) Modeling techniques
a) ER modeling
b) Dimensional modeling
1) Data interchange:
• It is a middleware which allows program calls to be made from one computer to another
via a computer network, providing location transparency through remote procedure calls.
• It handles transformation of in-process data structure to and from the byte sequence.
Modeling techniques:
a) ER modeling:
• It is a logical design technique whose main goal is to reduce data redundancy and
hence solve the problems in insert, delete and update.
• It is used for transaction capture and helps in initial stages of data warehouse construction
• identify entities
• draw ER diagram
• review the ER diagram with business users and get their sign off.
b) Dimensional modeling:
• It consists of one large table called as fact table and a number of relatively smaller tables
called dimensional tables.
• Each fact table has a mutli-part primary key and each table has a single-part primary key.
1) Correctness / accuracy:
• It is the degree to which the captured data correctly reflects /describes the world
entity/object/event.
examples:
• The bank balance in customer’s account is the real value customer deserves from the
bank
2) Consistency :
• The data throughout the enterprise must be in sync with each other
• Ex of consistent data: an employee has left a company and so his company email id is
made inactive
ex of inconsistent data : a customer has cancelled and surrendered his credit card. But
still his billing status reads due
3) Completeness :
a customer provides his address details at a restaurant but those details may be
incorrect.
4) Timeliness :
• It is important to provide right data at right time to the right people in business
• ex of timely data:
5) Metadata :
Ans:
Ans: Data profiling involves statistical analysis of source data and metadata.
ex: a column containing phone.no must be numeric. Hence remove any characters in the
field.
3) Candidate keys: to select a candidate key, analysis of the extent to which certain
columns are distinct is done.
4) Primary key selection: check if the candidate key does not violate NOT NULL and
UNIQUE constraint
5) Empty string values: check a column for null values or empty strings, since they create
problems while cube creation.
6)String length: analyzing the largest, average and shortest string length helps decide what
data type is appropriate for that column
7) Numeric length and type: assessing the max and min possible values for a numeric
column helps decide what datatype is suitable for that column.
8) Identification of cardinality:
• The cardinality relationships are important for inner and outer joins wrt several BI tools.
9)Data format:
• ex: marital status from “M” & “S” to “married “ and “single”