Professional Documents
Culture Documents
We should allow 80% of the task effort to focus on immediate requirements. The
remaining 20% of the effort should be spent trying to understand what future
requirements are.
The source systems (accounting, ERP, sales, inventory…), which will provide
information to data, warehouse needs to be identified.
The business rules that needs to be applied to the data needxs to be well understood.
If medium term requirements are not well understood, it could put limit the candidate
architecture to evolve and satisfy longer-term requirements.
Produce the logical model for the data warehouse and produce initial query profile.
Expect it to change on an ongoing basis.
Identify initial sizing estimates for database.
Understand the existing IT infrastructure and identify hardware preference.
Refine and rework initial conclusions. Always document the present.
Risks:
Because the data warehousing requirements are never fully understood, beware of the
tendency to extend this task. A common comment is “we still don’t fully understand the
requirements.” A time box is required as such requirements will be impossible to define
completely.
Focus on designing flexibility into data warehouse and business rule execution. Failure to
do so may lead to substantial cost penalties in the future.
Since a logical model of enterprise may not exist, spend some time trying to brainstorm
what a potential one might look like.
If the midpoint feedback from right people does not take place, the initial decision may
be suspect or risky, and further work on them may be inappropriate.
The following input are definitely required to start the architecture and blueprint of a
Data warehousing system:
o Identification of the mechanism for the data transfer and load.
o Database sizing and query performance expectations.
o Access control, backup and recovery guidelines.
o Overall scope definition. (Not just the first build definition).
IT infrastructure.
Risks:
MPP: An acronym for massively parallel processing. These are large multi-node machines with
larger number of CPUs.
SMP: An acronym for symmetric multi processing. An SMP machine consists of many CPUs,
which share memory and disk.