Professional Documents
Culture Documents
Degenerated dimension: It is derived from the fact table and doesnt have its
own dimension table. When the dimension attribute is stored as a part of the fact
table and not in a separate dimension table.
Example: A transactional code in a fact table.
Role Playing Dimension: Dimension which are often used for multiple purposes
within the same database .Example: Date dimension can be used for date of sale,
date of delivery or date of hire.
Multi-valued dimension: There are a no of situations in which a dimension is
legitimately multivalued. Example: A patient receiving a healthcare treatment may
have multiple simultaneous diagnoses. In this case, the multivalued must be
attached to the fact table through a group dimension key to a bridge table with one
row for each simultaneous diagnosis in a group.
Snow flake dimensions can also be used to model hierarchical structures.
Mini Dimensions: This is required for rapidly changing large dimensions. Typically
used for managing high frequency, low cardinality change in a dimension.
Example: Suppose we have a customer dimension with millions of records and we
need to use mini dimensions to tract customer attribute changes because SCD type
2 will not be effective due to large no of additional rows required to support all the
change. The mini dimension technique uses a separate dimensions for the
attributes that changes frequently.
Outriggers Dimensions: A dimension that contain a reference to another
dimension table .For ex: a bank account dimension can reference a separate
dimension representing the date of the account was opened. So, these secondary
dimension references are outriggers. These are permissible but should be used
sparingly
SCD Type 1: No history of dimension changes is kept in the database. The old
dimension value has been overwritten by the new one.
Before change: 1 10843830 Priyanka Payal
After change: 1 10843830 Priyanka Sinha
SCD Type 2: All history of dimension changes is kept in the database. We have to
capture the attribute change by adding a new row with a new surrogate key to the
dimension table. We can store the data in three different ways.
Versioning
Flagging
Marston
Illions
Marston
Seattle
Effective Date
-----------------------------------------------------------------------
Marston
Marston
Illions
Seattle
01-Mar-2010
End_date
21-Feb-2011
20-Fdb-2011
NULL
SCD Type 3: Only the current status and previous status of the row is maintained in
the table. To track this we have two separate columns in the table.
Surrogate key customer_id customer name Current Location previous location
--------------------------------------------------------------------------
Marston
Illions
NULL
Marston
Seattle
Ilions
Marston
New York
Seattle
SCD Type 4: This uses historical table. In this method, a separate historical table is
used to track all dimensions attribute historical changes for each of the dimension.
Current Table:
Customer ID
Customer Name
Customer Type
Cust_1
Corporate
Historical Table:
Customer ID
Customer Name
Customer Type
Start Date
End Date
Cust_1
Retail
01-01-2010
21-07-2010
Cust_1
Oher
22-07-2010
17-05-2012
Cust_1
Corporate
18-05-2012
31-12-9999
Facts And Fact tables: A fact table is the one which consists of the
measurements, metrics or facts of business process. These measurable facts are
used to know the business value and forecast future business.
Additive Facts: These can be summed up through all the dimensions in the fact
table.
Ex: Sales Amount can be summed up along any of the dimensions present in the
fact table.
Date
Store
Product
Sales Amount
Semi-Additive Facts: that can be summed for some of the dimensions in the fact
table, but not the others.
Ex: Daily Balance can be summed up through customer but not through time
dimension.
(Bank with the following fact table)
Date
Account
Current Balance
Profit Margin
The purpose of this table is to record current balance for each account at the end
of the each day as well the profit margin for each account for each day.
Now here current margin is the semi additive as it makes sense to add them up for
all the accounts (as in whats the total current balance for all the accounts in the
bank) but it doesnt make sense to add them up through time (adding up all current
balances for a given account for a given account for each day of the month doesnt
give nay relevant information).
Non-Additive Fact: Which cannot be summed up across any dimensions. Profit
margin is a non-additive fact as it does not make sense to add them up for the
account level or the day level
Ex: Facts which have percentages or ratios.
Transaction Fact Table: These fact tables represent an event that occurred at an
instantaneous point in time. A row exists in the fact table for a given customer or
product only if a transaction has occurred.
Fact less Fact Table: The fact Table that contains no measures or facts are called
fact less table:
For exp: A fact table which has only product key and date key is a fact less fact.
There are no measures in the table. But still we can get no.of products sold over a
period of time.
Snapshot Fact Table: it describes the state of things in particular instance of time
and usually includes more semi-addictive and non-addictive facts. Ex: daily
balances
Periodic Fact Table: These are needed to see cumulative performance of the
business at regular, periodic time intervals. Unlike in transaction fact table where
we load a row of each event occurrence, here with the periodic snapshot, we take a
picture of the activity at the end of the day, week, month and the picture at the end
of the next period and so on.
EX: Performance summary of a salesman over the previous month.
Accumulated Fact Table: This used to show the activity of a process that has a
well-defined beginning and end. Ex: Processing an order. An order moves through a
no. of steps until it is fully processed.
As steps towards fulfilling the order are completed, the associated row in the fact
table is updated.
Accumulating snapshots almost have multiple date stamps representing predictable
major events or phases that took place during the course of a lifetime. There is an
additional date column which indicates when the snapshot was last updated.
Level 2
CA
CA
Washington DC
Vatican City
Level 3
San Francisco
Los Angeles
<NULL>
<NULL>
INFORMATICA B2B:
Informatica PC designer provides capability to read from structured data sources
like relational databases, XMLs or Flat Files(delimited or fixed width) etc. When the
source is not structured or semi structured like PDF fles,word documents or JSON
files ,Informatica B2B data transformation tool helps to read the desired data from
such files makes it in structured format like XML so that it can be processed by
power center.
Informatica Parallelism:
We can achieve this using portioning sessions.It will let us split the large set into
smaller subsets which can be processed in parallel to get enhanced performance.
Following points need to be considered while session partitioning.
Partition: It is a subset of data that executes in a single thread.
No. of partitions: We can divide the data set into smaller chunks by increasing no.
of partitions.Whenever we add partitions,we increase no. of processing threads
which increases performance
Stage : Its apportion of pipeline implemented at run time as thread.
Partition Point: Boundary between two stages and divide the pipeline into stages.A
Partition point is always associated with a transformation.
Informatica Inbuilt Error Logging feature can be leveraged to implement Row Error logging in a
central location. When a row error occurs, the Integration service logs error information which
can be used to determine the cause and source of the error.
These errors can be logged either in a relational table or in a flat file. When the error logging is
enabled, the Integration service creates the error table or error log file the first time when it runs
the session. If the error table or error log file exists already, then the error data will be appended.
Following are the activities that need to be performed to implement the Informatica Row Error
Logging:
1. In the Config object tab of Error Handling option , set the Error Log type attribute to
Relational database or Flat File. By Default error logging is disabled.
2. SET Stop On Errors = 1
3. If the Error Log Type is set to Relational, specify the Database connection & Table Name
Prefix
Following are the tables which will be created by Integration service and which will be
populated as and when the error occurs.
PMERR_DATA
Stores data and metadata about a transformation row error and its corresponding source row.
PMERR_MSG
Stores metadata about an error and the error message.
PMERR_SESS
Stores metadata about the session.
PMERR_TRANS
Stores metadata about the source and transformation ports, such as name and datatype, when a
transformation error occurs.
4. If the Error Log Type is set to Flatfile, specify the Error log file directory and Error log
file name
Database Error Messages and the Error messages that Integration service writes to Bad File/
Reject file can also be captured and stored in the Error log tables / Flat files.
Following are the few database error messages which will be logged in the Error Log Tables /
Flat files.
Error Messages
Cannot Insert the value NULL into column <<Column name>>, table
<<Table_name>>
Violation of PRIMARY KEY constraint <<Primary key constraint
name>>
Violation of UNIQUE KEY constraint <<Unique Key Constraint>>
Cannot Insert Duplicate key in object <<Table_name>>
Row Error Logging Implementation
Advantages
Since the Informatica Inbuilt feature is leveraged, the Error log information would be very
accurate with very minimal development effort.
Pitfall
Enabling Error logging will have an impact to performance, since the integration service
processes one row at a time instead of block of rows.
-p <Password>
-f <Folder name>
-w <workflow name>
2. Start Workflow
Pmcmd startworkflow sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
3.Stop workflow
Pmcmd stopworkflow sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
4.Start Workflow from task
Pmcmd starttask sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
- startfrom task-name
5. Stop a task
Pmcmd stoptask sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
Task-name
6. Abort Workflow
Pmcmd abortworkflow sv <IntegrationService name>
-d <DomainName>
-u <User>
-p <Password>
-f <Folder name>
-w <workflow name>
7. Abort task
Pushdown Optimization:
Master data Management:
-Refers to process of creating and managing data that an organization muster have
as a single master copy called as master data.
Master data usually include customers, vendors, employees, products etc but differ
by different industries and even different companies within the same industry.
MDM is imp as it offers enterprise a single version of truth as there can be risk of
having multiple copies of data that are inconsistent with one another.
Its main purpose is to maintain a single source of truth for a particular dimension
within the organization. It requires solving the root cause of the inconsistent
metadata because master data needs to be propagated back to the source
system(at the data source level)
MDM is only applied to entities and not on transactional data. It only affects data
that exists in dimensional tables
Monitoring :
Here 2nd option is most intricate one and intensive checks are needed.
Load Monitoring :
1.
Monitoring the loads (workflows) whether the loads are scheduled to
respective timings.
2.
3.
Identifying the tardiness in loading process in case of any sessions are in
hung status.
We have done automations for most of the above monitoring activities but still the
manual re-checks are more important.
1.Alerts Monitoring.
2.Load Monitoring .
3.Servers Monitoring.
user
folder
permission
in a PowerCenter Environment.