This action might not be possible to undo. Are you sure you want to continue?
Matthias Urech ETL Developer
January 28, 2005
About the author
Matthias Urech is an ETL developer and owner of interface-development.com, a website devoted to provide solutions to the interface developer community since 2003. He writes primarily on data integration topics and issues with specific information for Informatica PowerCenter developers. Matthias has a master in business information management and his expertise is shaped by years of experience and practical application of Informatica PowerCenter and databases.
This article provides a questionnaire that can be useful when being involved in the design process of an interface.
Do not design a bridge by counting the number of people who swim across the river today. That's also true for ETL projects. Depending on your role in the ETL project, your work starts sometimes when the data flow needs to be built. Fine, someone will tell you what you have to do. But sometimes you will be involved earlier in the project to design the interface. As a matter of fact, you are faced then with the challenge of gathering requirements. That's where the ETL Design Questionnaire comes in place. Asking the right questions is not only essential, but will put you also in a position of controlling the design process. As a side effect, you will be recognized as a professional ETL developer that has a plan. Sounds like the questionnaire is an exciting and useful tool to work with. But how is this different from other methodologies or frameworks (i.e. Informatica Velocity)? Each of the provided questions is supported by tables or graphical elements (let's call them diagrams). It is neither about creating comprehensive documentation nor strictly answering all questions. Consider this set of questions rather as a presentation of multiple views in order to understand the interface. The main goal is to gather as much information with a simple approach. The key word here is "tailoring". Decide yourself what you want to use! Like everybody has a schema for getting rich that will not work, using the proposed questions in this article will neither prevent you for going through the design process nor is the list of questions complete. At the end of the day, your goal should be to provide the requirements for developing a working interface, respond to changes, and have a good costumer communication and collaboration. I hope nonetheless that the provided ETL Design Questionnaire will be useful to you and to your challenges.
ETL Design Questionnaire
In this section we will go through the following set of questions: Question 1: Who is involved? Question 2: Which event triggers the interface? Question 3: How is the target layout? Question 4: How is the logical mapping? Question 5: Where are data quality issues addressed? Question 6: How is the data life cycle? Question 7: How is the interface data flow? Question 8: What are the operational tasks? Question 9: What level of documentation should be provided?
Question 1: Who is involved?
Basically, the ETL team is responsible for extracting, transforming and loading data. More specifically, there are more tasks to think about within and outside the ETL team. In detail: Performing data analysis Defining data quality strategy Gather business rules Develop interface Establish test plans Perform reconciliation Execute tests and provide sign off Implement system into production Communication and enable change management Documentation of interface and support cases These are just some of the tasks and the list is by far not complete. However, all those tasks have to be done by someone. The objective of the role/task diagram is to define the involved people and their responsibilities. In short: you can simple ask "who does what?". For example: subject matter expert (who) provides business rules (what).
Figure 1: Role/Task Diagram
Question 2: Which event triggers the interface?
Consider the interface as a black box for the moment. First, we want to understand the overall context before building the interface. In detail, you want to know what causes an event that provides input for the interface and what is the output. For example: Time and expense data has been posted after the month end (event) will read daily charged hours for each employee (input) with the interface and deliver aggregated hours for the financial period (output).
Figure 2: Context Diagram
Question 3: How is the target layout?
All that matters is the result of the solution resp. the target. Therefore, the earlier you know what you have to provide the earlier you can begin with the development.
Table 1: Target Definition Table
Question 4: How is the logical mapping? We presume that source and target are known. The logical mapping table helps you defining the linking of source and target fields and to document business rules. The logical mapping is like water. It's easier to build something on it if linking and business rules are frozen.
Table 2: Logical Mapping Table
Question 5: Where are data quality issues addressed?
Here, it is about defining if you should care about data quality. You should address as many data quality issues to the source as possible since future interface development initiatives would otherwise have to deal with it again. However, some issues like incomplete data might be best addressed in the interface.
Table 3: Data Quality Assignment Table
Question 6: How is the data life cycle? This is the most important question of all. We are not only discussing the data flow during the life cycle but also about the relationship between the systems. Let's have a quick look at different types of relationships before continue explaining the data life cycle diagram. Figure 3: illustrates the three types of system relationships:
Figure 3: System Relationships
Master / Slave
This is the most common relationship. Data will be maintained in system 1 and provided to system 2.
Master / Master (one direction)
In this relationship, data will be maintained in both systems. Only system 1 will be able to update data in system 2. Therefore, additional efforts (either manual or automatic) have to be done to prevent data inconsistency and loss of data quality.
Master / Master (both directions)
As already mentioned in the previous relationship, data will be maintained in both systems. This relationship shows that both systems are able to update data in each other. By knowing the types of relationships, you are now able to draw the data flow in the data life cycle diagram. For example: the data flow arrow will point from system 1 to system 2 in case of a Master/Slave relationship. What's left is to move the data flow arrow horizontally to define at which point in time an action (i.e. create) in system 1 will cause a certain action in system 2. Please note that the given actions in system 1 are just examples. Some systems only allow flagging data inactive instead of deletion. And the road still doesn't end here since some systems are connected to more than one. Therefore, you could also draw additional systems to the diagram. In such situations it would be worth spending some thoughts about prioritizing the data flow order and if the data food chain makes sense at all.
Figure 4: Data Life Cycle Diagram
Question 7: How is the interface data flow?
The interface data flow diagram is mostly used to outline the extract, transform and load (ETL) process. The goal is to have a common understanding about the data flow and the involved applications and actions to deliver data between the systems.
Figure 5: Interface Data Flow Diagram
Question 8: What are the operational tasks?
Some operational tasks are overseen during development. As a result, you have to put your hands again on the interface. Thinking about operational steps from the beginning will help you identify hidden requirements and perform accurate effort estimates.
Table 4: Operational Task Table
Question 9: What level of documentation should be provided?
Everything is built and your job is done. Everything? Right, documentation should also be provided. The understanding about the scope of documentation is often different. Table 5 supports you in defining the documentation scope.
Table 5: Documentation Decision Table
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.