Professional Documents
Culture Documents
Hierarchical Data
without CONNECT-BY
-- A Path Code Approach
Charles Yu
Database Architect
Elance Inc.
cyu@elance.com
charles.yu@acm.org
2005-08
Background
Node-Uniform Hierarchical (NUH for short) data can be
visualized as a tree or forest graph where every node has
the same set of attributes.
Some variant:
Use a separate table to store the hierarchical relationship
consisting essentially of two columns: xid/child_xid and
parent_xid; and use FK to link the table to the main data
table
Basic Recursive Table Query
Mechanisms
Oracle-native Connect by
K-way self outer join (for up to level
k depth)
Other??
Basic Idea of Path Code Approach
A node of a tree is fully determined
by the path from the root to itself.
Path code as full representation of
the path can be very compact in
length, in the order of logarithmic of
total size of the tree.
Path Code can be maintained
dynamically feasibly.
Path code permits direct indexing.
Path-code enhanced recursive
table design
Basic Columns
xid
parent_xid
path_code --code of the path for the node (detail
later)
entry_level --level of the record in the tree the entry
belongs to
sibling_no --sequence no of the child entry_code
with respect to the parent
is_leaf --1/0 for being a leaf/not a leaf
Entry_code --content unique identifier of the entry
normal_stuff --one or more
Value Setting for H columns (I)
Parent_xid set as usual
Sibling_no can be set according to any ordering,
e.g. according to entry_code, starting at 1 for
each parent; the sibling_no of root entries are set
as if those roots were children of a super root;
Entry_level can be set from top down, having
entry_level=0 for all root entries; and
X.entry_level=k+1 if X has parent Y and
Y.entry_level=k;
Is_leaf =0/1 if there is child of the node/not so
Value Setting for H columns (II)
Path_code
– for root entries X: X.path_code = to_char(X.sibling_no,
‘00’)
– for non-root entries X with X.parent_xid=Y.xid:
X.path_code = Y.path_code||to_char(X.sibling_no,’00’)
Explanation
b2 b1 •Path_code is in the uniform format
1 1
2 3 •Path_code order is based on entry_code order but not
0102 0101 on XID order. It could be otherwise.
•Path_code of a child is the path_code of its parent plus
its base section code.
c3 c2 c1
2 2 2 •Sibling_no is not shown but assumed to be in
4 5 6 accordance with entry_code.
010203 010202 010201
•Entry_code and xid value settings can be independent
of each other.
•parent_xid, sibling_no, is_leaf and other fields are not
shown.
Variants of path_code pattern
(advanced topic)
node uniform: every section of all path codes has equal
length (a simplest; and it is used in the previous example)
Level uniform: every section of the same level of all
path_codes has equal length
Parent uniform: every child node of any parent node has
equal path_code length
Dot (or delimiter) uniform: use the same delimiter
character (e.g. dot) to separate all sections of all
path_codes
Min uniform: the length of base section of the path_code is
always maintained to be minimum
String/Binary/hex/ in expression and interpretation, sorting
relevant, etc.
Sparse uniform: path_code sections each allows more
values than actually and currently needed, for easing
subsequent node insertions.
Query Patterns
Get all children of a parent P
select * from T where path_code like
P.path_code||’%’
Get all ancestors if a child C
select * from T where C.path_code like
path_code||’%’
Get all siblings of a node N
select * from T where parent_xid =
N.parent_xid
DML Patterns (insert at end)
(Insert record with path_code and sibling_no as null)
insert into T(xid,parent_xid,entry_level, entry_code,
normal_stuff) values c.xid,p.xid, p.entry_level + 1,
c.entry_code, c.normal_stuff;
(Update sibling_no)
update T set sibling_no = (select max(sibling_no)+1 from T
where parent_xid = p.xid) where xid = c.xid;
(Update path_code)
update T set path_code = p.path_code ||
to_char(sibling_no, '00') where xid = c.xid;
(Update path_code for those siblings elder than c and all decendents of
those elder siblings, pcs_length stands for path_code section length)
update T set path_code = substr(path_code,1, pcs_length*entry_level) ||
to_char(sibling_no, '00')||substr(path_code,
pcs_length*(entry_level+1)+1) where path_code like p.path_code||’%’
and path_code > (select path_code from T where xid = c.xid)
DML patterns (delete)
(Delete node C and all its decendents)
delete from T where path_code like C.path_code||’%’;