You are on page 1of 5

http://blog.greysparling.com/2005/08/inside-peoplesoft-trees-or.

html

Friday, August 12, 2005

Inside PeopleSoft trees - or - Photosynthesis at PeopleSoft

I've talked with many PeopleSoft customer who have needed to use PeopleSoft
trees outside of nVision, Query, or Cube Manager. One of the services I had pushed
for was to add this functionality into PeopleCode (especially app engine).

Doesn't Tree Manager already have an API?


Sort of. Although there is an API available in PeopleCode for trees (called the tree
classes), these APIs are focused on describing and maintaining trees (essentially
tree manager functions). However, as any good BI person will tell you, it's not the
act of maintaining a dimension that's important... it's what you can do with the
dimension when you join it to other stuff.

So, what sort of things would you want to do?


The things you would want to do is to filter data with a tree or subtotal data using a
tree as the grouping mechanism. The former is available to both nVision and
PS/Query as either Filtering criteria or "in tree" criteria. The latter is only available in
nVision through nPlosion. However, there are lots of things outside of nVision or
Query that you would want to do.

• Aggregate data using the tree. Those familiar with Summary ledgers in GL understand the
concept. You can aggregate data using a tree for other purpose, such as improving performance for
downstream processes. For example, if you want to display a count of reported problems by product
family for all product families, you're doing this type of aggregation at the product family level of the
tree.
• Filter data using the tree. This is generalizing the "in tree" criteria in PS/Query. A couple of good
examples of this are the batch programs used in GL for selecting open receivables to restate, or for
selecting what data to use in Allocations. Another example of where this would be valuable would be in
search records, where you could find all cases for eastern region customers, that are reported against
any product in the Finanicals product line.

Today, PeopleSoft applications and customers use the technique I'm describing below to
accomplish this.

First, a little more on Trees

The first thing to understand is how trees work. A tree is not a standard structure (such as a strict
self-referencing table, as described in Ralf Kimball's definitive book on data warehousing), or a
standard denormalized structure. It has a unique structure that allows it to do the following
things:

Things that are not supported well by denormalized tables.

1. It supports unbalanced (or ragged) hierarchies.


2. It supports dynamic leveling.
3. It supports joining at any level of the hierarchy.

Things that are not supported well by a self-referencing table.

1. It supports finding all children at any level in a single SQL statement regardless of the number of
intermediate levels.

So, how does it do it?

The key to it is the data model, and the fields added to it (that are maintained by tree manager).
Here are the tables in the model:

• PSTREEDEFN - This identifies the tree, and the attributes of it (such as where to get application
attributes of the nodes, leafs, and levels).
• PSTREENODE - This is the heart of the tree. It is a self-referencing table that also has additional
attributes to facilitate non-recursive access to children of a node.
• PSTREELEAF - This maps a node to the sets of leafs or detail values that can be used with it. This
can be thought of as the "leaf" level in the BI world, but that the leafs can be attached to any level of
the tree, even to nodes that have children nodes.
• User tables - These are tables that store additional attributes to the node, leaf, or level, such as
description, size, manager, etc. This allows a tree to be defined against any data element in the
application and also opens up a whole host of opportunities for using trees in a given customer
enviornment.

It's also important to understand that the tree definition is the starting place for everything, and
that all the tools owned tables (i.e. not the user tables) strictly follow the key structure of the tree
definition. Because trees are effective dated and utilize setid indirection, this is an important
thing, because effective dating and setid indirection is only evaluated for the PSTREEDEFN
table and the user tables. This logic is not used when joining between PSTREEDEFN and
PSTREENODE. This means that from the perspective of effective dating, any nodes that have
not changed between to tree instances are duplicated in the PSTREENODE table.

Once you've identified the tree you want to use (setid, tree name, and effdt are the unique keys),
you also know the high order keys to use in the PSTREENODE table to get the nodes for that
tree. Now, let's look at the structure of the PSTREENODE table:

• Setid
• Tree Name
• Effdt
• TREE_NODE
• TREE_LEVEL_NUM
• TREE_NODE_NUM
• TREE_NODE_NUM_END
• PARENT_NODE_NUM
As you can see, this table is self-referencing (the PARENT_NODE_NUM tells you the
NODE_NUM of a tree node's parent). However, tree manager also maintains the
TREE_NODE_NUM_END field, which is what prevents the need to do a recursive set of selects
to find all the descendants of a node. Tree nodes are strictly numbered. A tree nodes' descendents
will always have a node number that is between the value of its TREE_NODE_NUM and
TREE_NODE_NUM_END.

Therefore, if I want to select all the descendents of the node "Assets", here is the SQL I would
issue:

Select B.TREE_NODE, B.TREE_LEVEL_NUM


from PSTREENODE A, PSTREENODE B
where A.SETID = B.SETID and A.TREE_NAME = B.TREE_NAME and A.EFFDT = B.EFFDT
and A.TREE_NODE = "Assets"
and B.TREE_NODE_NUM between A.TREE_NODE_NUM and A.TREE_NODE_NUM_END

You could also filter by level number to get the descendents at a given level.

Now, to add in the leafs. For winter trees, this step is not necessary (winter trees are trees where
the nodes themselves represent data points of interest, such as positions, whereas with summer
trees, the nodes are the hierarchy that categorizes something else, such as most account trees or
customer trees).

Because leafs are essentially mapping tables, you either join them directly to another table (either
the user table to get attributes, or a fact table to get rows). However, for performance purposes,
most PeopleSoft processes will stage the data values before joining it to a fact table.

Getting back to the subject at hand, here's the table structure of the PSTREELEAF
table:

• Setid
• Tree Name
• EFFDT
• TREE_NODE_NUM
• RANGE_FROM
• RANGE_TO
• DYNAMIC_DTL_FLG

As you can see, the high level keys are again the same. Also, you can see that the mapping to the
tree node is by node ID, and the mapping to the target table is by either individual key values or
by range.

So, let's say we want to find out all account numbers and account types for all accounts under the
"Asset" node of the tree. Now, there are three things that need to be looked up in order to
determine what to put in the SQL:

1. The table and field where the Accounts are stored. This is stored in a tree table I haven't discussed,
which is the PSTREESTRUCT table (which has a reference in the tree defninition).
2. The setid to use for the accounts. As mentioned, setid indirection occurs when going after the user
tables. Depending on where your getting your business unit or setid, you will have to do the lookup
separately (which is a more complex topic, but is often hard-coded at a customer anyway).
3. The effective date to use for the accounts. Again, this occurs when going after the user table. This
means that you have to perform the standard correlated subquery for effective date logic. I won't go
into detail on this in this posting, but you can see the SQL by building a query against the table in
PS/Query.

So, your SQL will look like the following when after the list of accounts under a tree node:

Select C.ACCOUNT, C.ACCOUNT_TYPE


from PSTREENODE A, PSTREELEAF B, GL_ACCOUNT_TBL
where A.SETID = B.SETID and A.TREE_NAME = B.TREE_NAME and A.EFFDT = B.EFFDT
and A.TREE_NODE = "Assets"
and B.TREE_NODE_NUM between A.TREE_NODE_NUM and A.TREE_NODE_NUM_END
and C.ACCOUNT between B.RANGE_FROM and B.RANGE_TO

As you can see, this SQL statement didn't have to look at any children nodes at all! Because the
leafs are attached using node numbers, and the range on the tree node record identifies all the
node numbers of all its children, you could join the leaf directly to the node.

Now, it's important to note that this SQL will not perform in DB2, because indexes are not used
when joining two fields of different size (which is why nVision has 30 different tables for
staging the join between the tree leaf and the data table).

Staging the join

As mentioned previously, when joining to a fact or data table, one generally stages the join into
an intermediate table. nVision and all other PeopleSoft delivered processes use selector tables (in
nVision it's PSTREESELECT##, where the ## represents the field size of the field joined to. In
nVision, there's a whole subsystem dedicated to managing what goes into the PSTREESELECT
tables. For the purposes of this discussion, you can merely create your own selector tables that
identify what you are putting in there, and the set of values that it represents. Let's assume, for
example, you have a table called "PS_ACCOUNT_TREE_FLAT" to represent the flattened
account tree. The SQL to load the data related to the "Assets" node might look as follows:

Insert into PS_ACCOUNT_TREE_FLAT as


Select "Assets", B.RANGE_FROM, B.RANGE_TO
from PSTREENODE A, PSTREELEAF B
where A.SETID = B.SETID and A.TREE_NAME = B.TREE_NAME and A.EFFDT = B.EFFDT
and A.TREE_NODE = "Assets"
and B.TREE_NODE_NUM between A.TREE_NODE_NUM and A.TREE_NODE_NUM_END

You could then join to the ledger table as follows:

Select Sum(B.POSTED_TOTAL_AMT)
from PS_ACCOUNT_TREE_FLAT A, PS_LEDGER B
where A.FILTERED_NODE = "Assets"
and B.ACCOUNT between A.RANGE_FROM and A.RANGE_TO
Hopefully, this gives a starting point to determining how to approach this. I suggest playing
around with a SQL editor and PS/Query to make sure you understand (I did the same thing with
nVision when I was first learning). Keep in mind, however, that the SQL that gets displayed in
either an nVision trace or the SQL tab in Query does not include the SQL for populating that tree
select table. However, it does show the filtering conditions used for joining the select table to the
data table. You can select directly against the tree select table to see what values were put in it,
and then look at the PSTREE% tables to compare the two.

Follow-on Info

For those who use tree manager extensively, we recently added a product that makes it easier to
maintain multiple trees together. Additional information can be found here. A demo of this
product can be found here.