Professional Documents
Culture Documents
Pitfalls of Object Oriented Programming
Pitfalls of Object Oriented Programming
What is OO programming?
Data abstraction
Encapsulation
Polymorphism
Inheritance
Slide 3
Objects
If objects are self contained then they can be
Reused.
Maintained without side effects.
Used without understanding internal
implementation/representation.
Slide 6
1979
2009
Slide 7
Named C++
1979 1983
2009
Slide 8
1979
1985
2009
Slide 9
Release of v2.0
1979
1989
2009
Slide 10
1979
1989
2009
Slide 11
Standardised
1979
1998
Slide 12
2009
Updated
1979
2003
Slide 13
2009
C++0x
1979
2009
Slide 14
Slide 15
CPU performance
Slide 16
CPU/Memory performance
Slide 17
Slide 19
My Claim
Node
Container class
Modifier
Updates transforms
Drawable/Cube
Renders objects
Slide 21
Object
Each object
Slide 22
Objects
Class Definition
Each square is
4 bytes
Slide 23
Memory Layout
Nodes
Slide 24
Nodes
Memory Layout
Class Definition
Slide 25
Slide 26
Slide 27
Leaf nodes
(objects)
return transformed
Whats
wrong with this
bounding code?
spheres
Slide 28
Leaf nodes
(objects)
transformed
If m_Dirty=false
thenreturn
we get branch
which costs 23 or 24
bounding misprediction
spheres
cycles.
Slide 29
Slide 30
Slide 31
Slide 32
L2 Cache
Cache usage
Main Memory
L2 Cache
parentTransform is already
in the cache (somewhere)
Slide 33
Cache usage
Main Memory
Slide 34
L2 Cache
Cache usage
Main Memory
L2 Cache
Slide 35
Cache usage
Main Memory
L2 Cache
Cache usage
Main Memory
L2 Cache
Cache usage
Main Memory
L2 Cache
Slide 38
Cache usage
Main Memory
L2 Cache
Slide 39
Cache usage
Main Memory
L2 Cache
Slide 40
Cache usage
Main Memory
L2 Cache
Slide 41
Cache usage
Main Memory
L2 Cache
Slide 42
Cache usage
Main Memory
L2 Cache
Slide 43
Cache usage
Main Memory
L2 Cache
Cache usage
Main Memory
L2 Cache
Cache usage
Main Memory
L2 Cache
Slide 46
The Test
11,111 nodes/objects in a
tree 5 levels deep
Every node being
transformed
Hierarchical culling of tree
Render method is empty
Slide 47
Performance
This is the time
taken just to
traverse the tree!
Slide 48
Why is it so slow?
~22ms
Slide 49
Look at GetWorldBoundingSphere()
Slide 50
Slide 51
if(!m_Dirty) comparison
Slide 52
Slide 53
Slide 54
Slide 55
Slide 56
Slide 57
Slide 58
Slide 59
Slide 60
Allocate contiguous
Nodes
Matrices
Bounding spheres
Slide 61
Performance
19.6ms -> 12.9ms
35% faster just by
moving things around in
memory!
Slide 62
What next?
Process data in order
Use implicit structure for hierarchy
Minimise to and fro from nodes.
We start with
a parent Node
Hierarchy
Node
Slide 64
Hierarchy
Node
Node
Slide 65
Node
Hierarchy
Node
Node
Node
Slide 66
Hierarchy
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Hierarchy
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Hierarchy
Node
Node
Node
Node
Node Node Node
Node
Node
Node
Node
Node NodeNode
Node
A lot of this
information can be
inferred
Slide
69
Node
Node
Hierarchy
Node
Node
Node
Node
Node
Node
Node
Node
Slide 70
Node
Node
Node
Hierarchy
Node
Parent has 2
children
:2
Node
Node
:4
:4
Node
Children have 4
children
Node
Node
Node
Node
Slide 71
Node
Node
Node
Hierarchy
Ensure nodes and their
data are contiguous in
memory
Node
:2
Node
Node
:4
:4
Node
Node
Node
Node
Node
Slide 72
Node
Node
Node
Slide 73
OO version
Update transform top down and expand WBS
bottom up
Slide 74
Update
transform
Node
Node
Node
Node
Node
Node
Node
Slide 75
Node
Node
Node
Node
Update
transform
Node
Node
Node
Node
Node
Node
Node
Slide 76
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Slide 83
Conversion to linear
Slide 84
Node
Node
:4
:4
Node
Node
Node
Node
Node
Slide 85
Node
Node
Node
Node
Node
:4
:4
Node
Node
Node
Node
Node
Slide 86
Node
Node
Node
Slide 87
Slide 89
Unified L2 Cache
Children
Childrens node
data not needed
Parent Data
Childrens Data
Slide 90
Unified L2 Cache
Parent Data
Childrens Data
Slide 91
Unified L2 Cache
Parent Data
Childrens Data
Slide 92
Unified L2 Cache
Parent Data
Childrens Data
Slide 93
Parent Data
Childrens Data
Slide 94
Unified L2 Cache
Parent Data
Childrens Data
Slide 95
Unified L2 Cache
Prefetching
ParentBecause all data is linear, we
can predict what memory will
be needed in ~400 cycles
and prefetch
Parent Data
Childrens Data
Slide 96
Unified L2 Cache
Slide 97
Performance
19.6 -> 12.9 -> 4.8ms
Slide 98
Prefetching
Data accesses are now predictable
Can use prefetch (dcbt) to warm the cache
Data streams can be tricky
Many reasons for stream termination
Easier to just use dcbt blindly
(look ahead x number of iterations)
Slide 99
Prefetching example
Slide 100
Performance
19.6 -> 12.9 -> 4.8 -> 3.3ms
Slide 101
A Warning on Prefetching
Slide 102
Slide 103
Slide 104
Up close
~16.6ms
Slide 105
Slide 106
Performance counters
Branch mispredictions: 2,867 (cf. 47,000)
L2 cache misses: 16,064
(cf 36,000)
Slide 107
In Summary
Slide 108
Simplify systems
KISS
Easier to optimise, easier to parallelise
Slide 110
Homogeneity
Remember
Better performance
Better realisation of code optimisations
Often simpler code
More parallelisable code
Slide 113
The END
Questions?
Slide 114