You are on page 1of 13

Kimball and Inmon Approaches:

Bill Inmon: its a top-down design approach of DW.


--A normalized data model is designed first. And then the dimensional data marts,
which contain data required for specific business processes or specific departments
are created from the data warehouse
Time Consuming in building data warehouse
Easy Maintenance
High Initial Cost, Subsequent project dev costs will be much lower
Specialist team is required
These are enterprise wide data integration requirements
Ralph Kimball: Its a bottom up approach in which the data marts facilitating
reports and analysis are created first and then these are combined together to
create a broad data warehouse.
It takes lesser time in building data warehouse
Difficult maintenance, often redundant and subject to revisions.
Shorter time for initial-setup
Generalist team.
Indi visual business areas
How to decide:
As it depends on the business objectives of an organization, nature of business,
time and cost involved and the level of dependencies between various functions
Dimensions and its types:
Conformed Dimension: A dimension consists of the attribute about the facts. It
store the textual descriptions of the business. Without the dimensions we cannot
measure the facts.
Conformed Dimension: It means the exact same thing with every possible fact
table to which they are joined. Example: the data dimension table connected to the
sales facts is identical to the date dimension connected to the inventory facts.
Junk Dimension: Junk table is a single table with a combination of different and
unrelated data attributes to avoid having a large number of foreign keys in the fact
table. Ex : like flags and indicators that do not fit in the base dimension table.
Example: A gender and Martial status dimension. Now in the fact table we have to
maintain two keys referring to these dimensions. I instead of that we can create a
junk which has all the combination of gender and marital status (i.e. their cross join)
and then we can maintain only one key in the fact table.

Degenerated dimension: It is derived from the fact table and doesnt have its
own dimension table. When the dimension attribute is stored as a part of the fact
table and not in a separate dimension table.
Example: A transactional code in a fact table.
Role Playing Dimension: Dimension which are often used for multiple purposes
within the same database .Example: Date dimension can be used for date of sale,
date of delivery or date of hire.
Multi-valued dimension: There are a no of situations in which a dimension is
legitimately multivalued. Example: A patient receiving a healthcare treatment may
have multiple simultaneous diagnoses. In this case, the multivalued must be
attached to the fact table through a group dimension key to a bridge table with one
row for each simultaneous diagnosis in a group.
Snow flake dimensions can also be used to model hierarchical structures.
Mini Dimensions: This is required for rapidly changing large dimensions. Typically
used for managing high frequency, low cardinality change in a dimension.
Example: Suppose we have a customer dimension with millions of records and we
need to use mini dimensions to tract customer attribute changes because SCD type
2 will not be effective due to large no of additional rows required to support all the
change. The mini dimension technique uses a separate dimensions for the
attributes that changes frequently.
Outriggers Dimensions: A dimension that contain a reference to another
dimension table .For ex: a bank account dimension can reference a separate
dimension representing the date of the account was opened. So, these secondary
dimension references are outriggers. These are permissible but should be used
sparingly

Slowly Changing Dimensions: Attributes of dimensions that would undergo


change over time is called slowly changing attribute and so the dimension is slowly
changing dimension.
SCD Type 0: The passive method in which no special action is performed upon the
dimension changes. Some can remain same as it was first time inserted others may
be overwritten.

SCD Type 1: No history of dimension changes is kept in the database. The old
dimension value has been overwritten by the new one.
Before change: 1 10843830 Priyanka Payal
After change: 1 10843830 Priyanka Sinha

SCD Type 2: All history of dimension changes is kept in the database. We have to
capture the attribute change by adding a new row with a new surrogate key to the
dimension table. We can store the data in three different ways.

Versioning

surrogate key customer_id customer_name Location Version


-------------------------------------------------------1
1
Marston
Illions 1
2
1
Marston
Seattle 2

Flagging

Surrogate key customer_id customer name Location Version


-------------------------------------------------------1

Marston

Illions

Marston

Seattle

Effective Date

surrogate key customer_id customer name Location Start_date

-----------------------------------------------------------------------

Marston

Marston

Illions
Seattle

01-Mar-2010

End_date

21-Feb-2011

20-Fdb-2011
NULL

SCD Type 3: Only the current status and previous status of the row is maintained in
the table. To track this we have two separate columns in the table.
Surrogate key customer_id customer name Current Location previous location
--------------------------------------------------------------------------

Marston

Illions

NULL

Surrogate key customer_id customer name Current_Location previous location


--------------------------------------------------------------------------

Marston

Seattle

Ilions

Surrogate key customer_id customer name Current Location previous location


-------------------------------------------------------------------------1

Marston

New York

Seattle

SCD Type 4: This uses historical table. In this method, a separate historical table is
used to track all dimensions attribute historical changes for each of the dimension.
Current Table:
Customer ID

Customer Name

Customer Type

Cust_1

Corporate

Historical Table:
Customer ID

Customer Name

Customer Type

Start Date

End Date

Cust_1

Retail

01-01-2010

21-07-2010

Cust_1

Oher

22-07-2010

17-05-2012

Cust_1

Corporate

18-05-2012

31-12-9999

Facts And Fact tables: A fact table is the one which consists of the
measurements, metrics or facts of business process. These measurable facts are
used to know the business value and forecast future business.
Additive Facts: These can be summed up through all the dimensions in the fact
table.
Ex: Sales Amount can be summed up along any of the dimensions present in the
fact table.
Date
Store
Product
Sales Amount

Semi-Additive Facts: that can be summed for some of the dimensions in the fact
table, but not the others.
Ex: Daily Balance can be summed up through customer but not through time
dimension.
(Bank with the following fact table)
Date
Account

Current Balance
Profit Margin

The purpose of this table is to record current balance for each account at the end
of the each day as well the profit margin for each account for each day.
Now here current margin is the semi additive as it makes sense to add them up for
all the accounts (as in whats the total current balance for all the accounts in the
bank) but it doesnt make sense to add them up through time (adding up all current
balances for a given account for a given account for each day of the month doesnt
give nay relevant information).
Non-Additive Fact: Which cannot be summed up across any dimensions. Profit
margin is a non-additive fact as it does not make sense to add them up for the
account level or the day level
Ex: Facts which have percentages or ratios.
Transaction Fact Table: These fact tables represent an event that occurred at an
instantaneous point in time. A row exists in the fact table for a given customer or
product only if a transaction has occurred.
Fact less Fact Table: The fact Table that contains no measures or facts are called
fact less table:
For exp: A fact table which has only product key and date key is a fact less fact.
There are no measures in the table. But still we can get no.of products sold over a
period of time.
Snapshot Fact Table: it describes the state of things in particular instance of time
and usually includes more semi-addictive and non-addictive facts. Ex: daily
balances
Periodic Fact Table: These are needed to see cumulative performance of the
business at regular, periodic time intervals. Unlike in transaction fact table where
we load a row of each event occurrence, here with the periodic snapshot, we take a
picture of the activity at the end of the day, week, month and the picture at the end
of the next period and so on.
EX: Performance summary of a salesman over the previous month.
Accumulated Fact Table: This used to show the activity of a process that has a
well-defined beginning and end. Ex: Processing an order. An order moves through a
no. of steps until it is fully processed.
As steps towards fulfilling the order are completed, the associated row in the fact
table is updated.
Accumulating snapshots almost have multiple date stamps representing predictable
major events or phases that took place during the course of a lifetime. There is an
additional date column which indicates when the snapshot was last updated.

Different Types of Hierarchies:


Balanced/Standard Hierarchies: Here the branches of the hierarchy all descend
to the same level with each members parent being at the level immediately above
the member.
Ex: A balanced hierarchy is one wich represents time where the depth of each
level(year,quarter and month) is consistent
Unbalanced/ragged Hierarchies: The hierarchy where the leaves don have the
same depth.
For example: An organization chart which show reporting relationships among
employees in an organization.
Level 1
USA
USA
USA
Vatican City

Level 2
CA
CA
Washington DC
Vatican City

Level 3
San Francisco
Los Angeles
<NULL>
<NULL>

INFORMATICA B2B:
Informatica PC designer provides capability to read from structured data sources
like relational databases, XMLs or Flat Files(delimited or fixed width) etc. When the
source is not structured or semi structured like PDF fles,word documents or JSON
files ,Informatica B2B data transformation tool helps to read the desired data from
such files makes it in structured format like XML so that it can be processed by
power center.
Informatica Parallelism:
We can achieve this using portioning sessions.It will let us split the large set into
smaller subsets which can be processed in parallel to get enhanced performance.
Following points need to be considered while session partitioning.
Partition: It is a subset of data that executes in a single thread.
No. of partitions: We can divide the data set into smaller chunks by increasing no.
of partitions.Whenever we add partitions,we increase no. of processing threads
which increases performance
Stage : Its apportion of pipeline implemented at run time as thread.
Partition Point: Boundary between two stages and divide the pipeline into stages.A
Partition point is always associated with a transformation.

Partition Type: Its an algorithm for distributing data among partitions


Types of Session Partitions:
Pass- through Partitioning:
Workflow concurrency:
Informatica Logging Mechanism:

Informatica Inbuilt Error Logging feature can be leveraged to implement Row Error logging in a
central location. When a row error occurs, the Integration service logs error information which
can be used to determine the cause and source of the error.
These errors can be logged either in a relational table or in a flat file. When the error logging is
enabled, the Integration service creates the error table or error log file the first time when it runs
the session. If the error table or error log file exists already, then the error data will be appended.
Following are the activities that need to be performed to implement the Informatica Row Error
Logging:
1. In the Config object tab of Error Handling option , set the Error Log type attribute to
Relational database or Flat File. By Default error logging is disabled.
2. SET Stop On Errors = 1
3. If the Error Log Type is set to Relational, specify the Database connection & Table Name
Prefix
Following are the tables which will be created by Integration service and which will be
populated as and when the error occurs.
PMERR_DATA
Stores data and metadata about a transformation row error and its corresponding source row.
PMERR_MSG
Stores metadata about an error and the error message.
PMERR_SESS
Stores metadata about the session.
PMERR_TRANS
Stores metadata about the source and transformation ports, such as name and datatype, when a
transformation error occurs.

4. If the Error Log Type is set to Flatfile, specify the Error log file directory and Error log
file name
Database Error Messages and the Error messages that Integration service writes to Bad File/
Reject file can also be captured and stored in the Error log tables / Flat files.
Following are the few database error messages which will be logged in the Error Log Tables /
Flat files.

Error Messages
Cannot Insert the value NULL into column <<Column name>>, table
<<Table_name>>
Violation of PRIMARY KEY constraint <<Primary key constraint
name>>
Violation of UNIQUE KEY constraint <<Unique Key Constraint>>
Cannot Insert Duplicate key in object <<Table_name>>
Row Error Logging Implementation
Advantages
Since the Informatica Inbuilt feature is leveraged, the Error log information would be very
accurate with very minimal development effort.
Pitfall
Enabling Error logging will have an impact to performance, since the integration service
processes one row at a time instead of block of rows.

Command line related commands:


Pmcmd: command line utility to perform following tasks.With this we can directly
interact with integration service
---kickoff or abort powercenter jobs.
---get the statics of the status of powercenter jobs.
1. Schedule workflow:
Pmcmd scheduleworkflow sv <IntegrationService name>
-d <DomainName>
-u <User>

-p <Password>
-f <Folder name>
-w <workflow name>
2. Start Workflow
Pmcmd startworkflow sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
3.Stop workflow
Pmcmd stopworkflow sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
4.Start Workflow from task
Pmcmd starttask sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
- startfrom task-name

5. Stop a task
Pmcmd stoptask sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
Task-name
6. Abort Workflow
Pmcmd abortworkflow sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
7. Abort task

Pmcmd aborttask sv <IntegrationService name>


-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
Task-name
PMREP Command
We can directly interact with powercenter rep and perform the below operations.
--- insert/delete/update informatica meta objects
--- list the informatica meta objects
---we can have deployments/migrations
---sequence generator updations
---property changes of powercenter objects
---Plug in installation(SFDC,SAP)
PMSTACK or PSTACK Command:Whenever IS crashes ,a core file will be
generated in bin directory .It contains all the info and errors related to service crash
but will not be in human reabadable format,rather it will be in binary format.So
using pmstack/pstack we can convert it into ASCII human readable format.
PMSERVER Command:

Pushdown Optimization:
Master data Management:
-Refers to process of creating and managing data that an organization muster have
as a single master copy called as master data.
Master data usually include customers, vendors, employees, products etc but differ
by different industries and even different companies within the same industry.
MDM is imp as it offers enterprise a single version of truth as there can be risk of
having multiple copies of data that are inconsistent with one another.
Its main purpose is to maintain a single source of truth for a particular dimension
within the organization. It requires solving the root cause of the inconsistent
metadata because master data needs to be propagated back to the source
system(at the data source level)
MDM is only applied to entities and not on transactional data. It only affects data
that exists in dimensional tables

Here reporting needs are different. It emphasizes to provide reports on data


governance, data quality and compliance rather than reports on analytical needs.
Ultra Messaging:
An application from Informatica company
We can send sms to any phone from a particular application in real time similarly as
we use to do in banking applcations via otp
So from this we used to get the message irrespective of the traffic congestion
Data Masking:
We generally use it to protect our production data .Suppose during SIT,ST,UAT, we
have a copy of the production data to test actual results.So, it is a feature of
informatica which replicates the production data set with the similar looking data.
so even for test ...privacy khtm na ho we use data masking
suppose tmhara address 'gaya' hai data masking se it will get replicated to say patna
so data real lagega bt will not be the real prod data so data theft ka issue nai

various types of informatica connections to DBs:


Informatica Admin Roles and Responsibilities:
Communicating to Business :
Informatica Admin acts as brand ambassador between the business and the various
projects teams involving Application Support Team ,Network, Database Teams,
Storage and Unix teams.
Infrastructure Support :
Responsible to manage Development ,QA and Production environments.
Maintaining the standards :
Making sure that code is sync in all the three environments.
Outages Handling :
1.Password Changes.
2.Patch installation (Hot Fixes /Software's )
3.Bouncing Activities or Restarting the Services
4.Network Changes

Monitoring :

We can discriminate both environment monitoring and load monitoring concepts


instead of consolidating. These are major activities in Informatica administration per
my expertise.
Environment Monitoring :
1.
Constantly keeping eye on whether the infrastructure is readily available, I
meant to say keep monitoring the readiness in the environment.
2.
System level parameter checks CPU spike, Memory utilization , Number of
parallel loads (sessions) running on each node.

Here 2nd option is most intricate one and intensive checks are needed.
Load Monitoring :
1.
Monitoring the loads (workflows) whether the loads are scheduled to
respective timings.
2.

Recovery or Re-start in loads in case of any failures.

3.
Identifying the tardiness in loading process in case of any sessions are in
hung status.
We have done automations for most of the above monitoring activities but still the
manual re-checks are more important.
1.Alerts Monitoring.
2.Load Monitoring .
3.Servers Monitoring.

Informatica PowerCenter administration involves various activities which includes


the following

Create folders and user accounts.

View and manage folder permissions and privileges.

Ping Domain, repository services and integration services.

Start, Stop services.

View service status, log.

Normally, performing these activities across multiple environments involve logging


into separate Administration console for each environment. This can be time
consuming.
Integrated Admin Console for Informatica can drastically cut down the manual
steps involved in the above activities. It also automates the workflow of request,
authorization and creation

user

folder

permission
in a PowerCenter Environment.

You might also like