Pragmatic Programmer on Meta Programming

Tip 29

Write Code that Writes Code.
- The Pragmatic Programmer, Hunt and Thomas,[1]
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 0 / 28

Lightweight Abstractions in C++
- An Introduction to CRTP and Expression Templates -

Peter Steinbach
Scientific Computing Facility Max Planck Institute of Molecular Cell Biology and Genetics

Dec 12nd, 2013

Outline

1. Motivation 2. Performant Inheritance 3. Expressive And Fast Calculations 4. The Case of Eigen 5. Summary 6. Appendix 7. Literature

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

2 / 28

Outline

1. Motivation 2. Performant Inheritance 3. Expressive And Fast Calculations 4. The Case of Eigen 5. Summary 6. Appendix 7. Literature

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

3 / 28

Motivation : Why Abstractions?

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

4 / 28

Motivation : Why Abstractions?

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

4 / 28

Motivation : The Following Slides ...
Abstractions in C++

Disclaimer

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

5 / 28

Outline

1. Motivation 2. Performant Inheritance 3. Expressive And Fast Calculations 4. The Case of Eigen 5. Summary 6. Appendix 7. Literature

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

6 / 28

CRTP: Virtual Inheritance Example
ROOT TH1D

one-dimensional histogram with double precision data extensive use of virtual inheritance overloaded with responsibilities
root.cern.ch[2]

to be fair: late 1990s, early 2000s

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

7 / 28

CRTP: Virtual Inheritance Example
ROOT TH1D

one-dimensional histogram with double precision data extensive use of virtual inheritance overloaded with responsibilities
root.cern.ch[2]

to be fair: late 1990s, early 2000s

List of virtual functions fills six A4 pages!

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

7 / 28

CRTP: Virtual Inheritance Recap
1

3

5

class AbstrBase { public : virtual unsigned update ( const unsigned & _in ) const = 0; }; class Derived : public AbstrBase { public : unsigned update ( const unsigned & _in ) const { return _in - 1; } };

7

9

11

13

dynamic polymorphism
function pointers to available implementations stored on stack pointers resolved through table ( vtable)
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 8 / 28

CRTP: Virtual Inheritance Experiment
1

3

unsigned count_down ( AbstrBase * _updater , const unsigned & _start_index ){ unsigned i = _start_index ; for (; i >0;) { i = _updater - > update ( i ) ; } return i ; }

5

7

9

11

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

9 / 28

CRTP: Virtual Inheritance Experiment
1

3

unsigned count_down ( AbstrBase * _updater , const unsigned & _start_index ){ unsigned i = _start_index ; for (; i >0;) { i = _updater - > update ( i ) ; } return i ; }

5

7

9

11

g++ -O3, 230 → 0

t = 2.1 s
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 9 / 28

CRTP: Curiously Recurring Template Pattern
1

3

5

template < typename Daughter > struct CRTPBase { int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } }; struct CRTPDerived : public CRTPBase < CRTPDerived > { int do_update ( const int & _in ) const { return _in - 1; } }; first discussed in 1995 [3] sub-templated Design-by-Policy implementation [4]
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 10 / 28

7

9

11

13

CRTP: CRTP = strange name
2

4

template < typename Daughter > struct CRTPBase { int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } };

6

8

10

12

14

struct CRTPDerived : public CRTPBase < CRTPDerived > { int do_update ( const int & _in ) const { return _in - 1; } }; CRTPDerived declared

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

11 / 28

CRTP: CRTP = strange name
2

4

template < typename Daughter > struct CRTPBase { int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } };

6

8

10

12

14

struct CRTPDerived : public CRTPBase < CRTPDerived > { int do_update ( const int & _in ) const { return _in - 1; } }; CRTPDerived declared CRTPBase instantiated
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 11 / 28

CRTP: CRTP = strange name
2

4

template < typename Daughter > struct CRTPBase { int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } };

6

8

10

12

14

struct CRTPDerived : public CRTPBase < CRTPDerived > { int do_update ( const int & _in ) const { return _in - 1; } }; CRTPDerived declared CRTPBase instantiated
P. Steinbach (MPI CBG)

CRTPBase::update declaration instantiated

LightAbstractions

Dec 12nd, 2013

11 / 28

CRTP: CRTP = strange name
2

4

template < typename Daughter > struct CRTPBase { int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } };

6

8

10

12

14

struct CRTPDerived : public CRTPBase < CRTPDerived > { int do_update ( const int & _in ) const { return _in - 1; } }; CRTPDerived declared CRTPBase instantiated
P. Steinbach (MPI CBG)

CRTPBase::update declaration instantiated CRTPDerived defined
LightAbstractions Dec 12nd, 2013 11 / 28

CRTP: CRTP = strange name
2

4

template < typename Daughter > struct CRTPBase { int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } };

6

8

10

12

14

struct CRTPDerived : public CRTPBase < CRTPDerived > { int do_update ( const int & _in ) const { return _in - 1; } }; CRTPDerived declared CRTPBase instantiated
P. Steinbach (MPI CBG)

CRTPBase::update declaration instantiated CRTPDerived defined
LightAbstractions

CRTPDerived defined

Dec 12nd, 2013

11 / 28

CRTP: CRTP = strange name
2

4

template < typename Daughter > struct CRTPBase { int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } };

6

8

10

12

14

struct CRTPDerived : public CRTPBase < CRTPDerived > { int do_update ( const int & _in ) const { return _in - 1; } }; CRTPDerived declared CRTPBase instantiated
P. Steinbach (MPI CBG)

CRTPBase::update declaration instantiated CRTPDerived defined
LightAbstractions

CRTPDerived defined CRTPBase::update definition instantiated if needed
Dec 12nd, 2013 11 / 28

CRTP: CRTP = strange name
template < typename T > unsigned count_down ( const T & _updater , const unsigned & _start_index ){ unsigned i = _start_index ; for (; i >0;) i = _updater . update ( i ) ; return i ; }

2

4

6

8

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

12 / 28

CRTP: CRTP = strange name
1

3

5

7

9

template < typename T > unsigned count_down ( const T & _updater , const unsigned & _start_index ){ unsigned i = _start_index ; for (; i >0;) i = _updater . update ( i ) ; return i ; }

g++ -O3, 230 → 0

Any Guesses?
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 12 / 28

CRTP: CRTP = strange name
1

3

5

7

9

template < typename T > unsigned count_down ( const T & _updater , const unsigned & _start_index ){ unsigned i = _start_index ; for (; i >0;) i = _updater . update ( i ) ; return i ; }

g++ -O3, 230 → 0

t = 0s
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 12 / 28

CRTP: What’s going on?
1

3

5

7

template < typename Daughter > class CRTPBase { public : int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } };

0 s . . . a bug? compiler simply emitted no code for crtp count down optimisation schemes dropped redundant operations impossible with dynamic polymorphism vtable incurs branching (virtual = 1000x branches than crtp) for loop acutally performed including indirection through inheritance
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 13 / 28

CRTP: Wrap-Up
1

3

5

7

template < typename Daughter > class CRTPBase { public : int update ( const int & _in ) const { return Daughter :: do_update ( _in ) ; } };

Pros and Cons
code reuse through common base class interface flexibility of class composition retained open for compiler optimisations functionality through class communication (OO design) compiler aided removal of virtual inheritance -fdevirtualize [5] price: no runtime flexibility
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 14 / 28

Outline

1. Motivation 2. Performant Inheritance 3. Expressive And Fast Calculations 4. The Case of Eigen 5. Summary 6. Appendix 7. Literature

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

15 / 28

ExprTempl: Adding two Vectors
cBLAS
1

3

double a [4] = {2 , 3 , 4 , 5}; double b [4] = {5 , 4 , 9 , 2}; // b <- 1.0* a + b cblas_daxpy (4 , 1.0 , a , 1 ,

b , 1) ;

Taken from http://www.mitchr.me/SS/exampleCode/blas/blas1C.c.html

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

16 / 28

ExprTempl: Adding two Vectors
cBLAS
1

3

double a [4] = {2 , 3 , 4 , 5}; double b [4] = {5 , 4 , 9 , 2}; // b <- 1.0* a + b cblas_daxpy (4 , 1.0 , a , 1 ,

b , 1) ;

Taken from http://www.mitchr.me/SS/exampleCode/blas/blas1C.c.html

Something readable would be nice! Vector a ( N ) , b ( N ) , c ( N ) ; // fill Vector ( s ) a and b c = a + b;

2

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

16 / 28

ExprTempl: A Standard Implementation
1

3

inline const Vector operator +( const Vector & _first , const Vector & _second ) { Vector temporary ( _first . size () ) ; for ( size_t index = 0; index < _first . size () ;++ index ) temporary [ index ] = _first [ index ] + _second [ index ]; return temporary ;

5

7

9

11

}

Issues
large vectors: temporary becomes the bottleneck

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

17 / 28

ExprTempl: A Standard Implementation
1

3

inline const Vector operator +( const Vector & _first , const Vector & _second ) { Vector temporary ( _first . size () ) ; for ( size_t index = 0; index < _first . size () ;++ index ) temporary [ index ] = _first [ index ] + _second [ index ]; return temporary ;

5

7

9

11

}

Issues
large vectors: temporary becomes the bottleneck large vectors: sum will always be carried out entirely
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 17 / 28

ExprTempl: Store Operations, not Data

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

18 / 28

ExprTempl: Store Operations, not Data
2

4

6

template < typename A , typename B > class sum { const A & left_ ; const B & right_ ; public : explicit sum ( const A & _left , const B & _right ) : left_ ( _left ) , right_ ( _right ) {} size_t size () const { return left_ . size () ; } double operator []( size_t _index ) const { return left_ [ index ] + right_ [ index ]; } };

8

10

12

14

16

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

18 / 28

ExprTempl: Store Operations, not Data
2

4

6

template < typename A , typename B > class sum { const A & left_ ; const B & right_ ; public : explicit sum ( const A & _left , const B & _right ) : left_ ( _left ) , right_ ( _right ) {} size_t size () const { return left_ . size () ; } double operator []( size_t _index ) const { return left_ [ index ] + right_ [ index ]; } };
sum is cheap in terms of memory (2 references inside) stores and passes the operation of one element
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 18 / 28

8

10

12

14

16

ExprTempl: Vector needs to be extended
class Vector { // ... template < typename T > Vector & operator =( const T & _in ) { resize ( _in . size () ) ; for ( size_t index = 0; index < _in . size () ;++ index )
8

2

4

6

v_ [ index ] = _in [ index ]; return (* this ) ; } // ... }; added new assignment operator to Vector just an example of interfacing Vector with sum required for (c = a + b)
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 19 / 28

10

12

ExprTempl: Harvesting the Fruit
Vector a ( N ) , b ( N ) , c ( N ) ; // fill Vector ( s ) a and b a + b; // 0 additions ( a + b ) [0]; // 1 addition c = a + b; // N additions

1

3

5

no memory overhead by temporaries achieved lazy evaluation (operation is conduction once it is really needed) intermediate abstraction has small memory foot print syntax is expressive and readable

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

20 / 28

ExprTempl: Runtime Improvements

Performance of vector additions using different C++ implementations from [6].

There is more . . .
expression templates first reported in 1995 [7] same scheme can be applicable in many more fields expression templates allow modular jump-in of accelerator code used by: Eigen [8], Blaze[9], MTL4[10] . . .

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

21 / 28

Outline

1. Motivation 2. Performant Inheritance 3. Expressive And Fast Calculations 4. The Case of Eigen 5. Summary 6. Appendix 7. Literature

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

22 / 28

Eigen: What is it?

for linear algebra, matrix and vector operations, numerical solvers and related algorithms header-only library complying C++98 standard performed for SSE 2/3/4, ARM NEON, and AltiVec instruction sets and non-vectorized implementations open source under Mozilla Public License v2 for details visit [8]

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

23 / 28

Eigen: Vector Addition Again

2

Eigen :: VectorXf u ( size ) , v ( size ) , w ( size ) ; u = v + w;

Derived:typename

MatrixBase

CRTP

CRTP

BinaryOp:typename LHS:typename RHS:typename

VectorXf

CwiseBinaryOp

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

24 / 28

Eigen: Vector Addition Real-Life

2

Eigen :: VectorXf u ( size ) , v ( size ) , w ( size ) ; u = v + w;

Derived:T OtherDerived:T

Derived:T OtherDerived:T

Derived:typename

MatrixBase
+operator=()

internal::assign_impl
+run(dst:Derived,other:OtherDerived)

internal::assign_selector
+run(dst:Derived,other:OtherDerived) CRTP CRTP

internal::pload

internal::padd

internal::pstore

BinaryOp:typename LHS:typename RHS:typename

VectorXf

CwiseBinaryOp

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

25 / 28

Outline

1. Motivation 2. Performant Inheritance 3. Expressive And Fast Calculations 4. The Case of Eigen 5. Summary 6. Appendix 7. Literature

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

26 / 28

Summary C++ template engine provides tools for lightweight abstractions CRTP can replace virtual inheritance at runtime expression templates reduce the need for temporaries and more revisit your code and wonder, what portions are compile time constants

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

27 / 28

Summary C++ template engine provides tools for lightweight abstractions CRTP can replace virtual inheritance at runtime expression templates reduce the need for temporaries and more revisit your code and wonder, what portions are compile time constants
Tip 29

Write Code that Writes Code.
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 27 / 28

Final Word

from [11]

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

28 / 28

Final Word

from [11]

Thank you for your attention!
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 28 / 28

Outline

1. Motivation 2. Performant Inheritance 3. Expressive And Fast Calculations 4. The Case of Eigen 5. Summary 6. Appendix 7. Literature

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

29 / 28

Appendix: Templates Recap

2

4

template < typename T > T add ( const T & _first , const T & _second ) { return _first + _second ; } C++ templates processed by template engine before compilation into binary

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

30 / 28

Appendix: Templates Recap

2

4

template < typename T > T add ( const T & _first , const T & _second ) { return _first + _second ; } C++ templates processed by template engine before compilation into binary templates and template engine form a Turing-complete programming language [12]

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

30 / 28

Appendix: Templates Recap

2

4

template < typename T > T add ( const T & _first , const T & _second ) { return _first + _second ; } C++ templates processed by template engine before compilation into binary templates and template engine form a Turing-complete programming language [12] using C++ templates for Meta Programming: C++ Template Meta Programming

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

30 / 28

Appendix: Factorial At Runtime

2

4

6

unsigned int runt ime_fa ctoria l ( unsigned int n ){ if ( n == 0) return 1; else return n * ru ntime_ factor ial (n -1) : }

If argument known at compile time, calculate it at compile time!

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

31 / 28

Appendix: Factorial At Compile Time
template < size_t n > struct static_factorial { static const size_t value = n * static_factorial < n - 1 >:: value ; }; template <> struct static_factorial <0 > { static const size_t value = 1 ; };
Classical example from [13].

2

4

6

8

input parameters required at compile time (implies static constness)

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

32 / 28

Appendix: Factorial At Compile Time
template < size_t n > struct static_factorial { static const size_t value = n * static_factorial < n - 1 >:: value ; }; template <> struct static_factorial <0 > { static const size_t value = 1 ; };
Classical example from [13].

2

4

6

8

input parameters required at compile time (implies static constness) result known at compile time already, no runtime investment

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

32 / 28

Appendix: Factorial At Compile Time
template < size_t n > struct static_factorial { static const size_t value = n * static_factorial < n - 1 >:: value ; }; template <> struct static_factorial <0 > { static const size_t value = 1 ; };
Classical example from [13].

2

4

6

8

input parameters required at compile time (implies static constness) result known at compile time already, no runtime investment these days CPU are so fast, this example is more academic
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 32 / 28

Appendix: VTable
clang -cc1 -fdump-record-layouts virtual_sizeof.cpp
*** Dumping AST Record Layout 0 | class Derived 0 | class AbstrBase (primary base) 0 | (AbstrBase vtable pointer) 0 | (AbstrBase vftable pointer) | [sizeof=8, dsize=8, align=8 | nvsize=8, nvalign=8]

clang -cc1 -fdump-record-layouts crtp_sizeof.cpp
*** Dumping AST Record Layout 0 | class CRTPDerived (empty) 0 | class CRTPBase<class CRTPDerived> (base) (empty) | [sizeof=1, dsize=0, align=1 | nvsize=1, nvalign=1]

Back to Virtual Inheritance Recap.
P. Steinbach (MPI CBG) LightAbstractions Dec 12nd, 2013 33 / 28

Appendix: Disclaimer

All images (except stated otherwise) in this presentation were taken from the OpenClipArt gallery and are subject to the public domain. See http://openclipart.org/share for more information on sharing terms.

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

34 / 28

Outline

1. Motivation 2. Performant Inheritance 3. Expressive And Fast Calculations 4. The Case of Eigen 5. Summary 6. Appendix 7. Literature

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

35 / 28

Literature
[1] [2] A. Hunt and D. Thomas, The Pragmatic Programmer. Addison and Wesley, 2000. R. Brun and F. Rademakers, “Root - an object oriented data analysis framework,” in Proceedings AIHENP’96 Workshop, vol. A of Nucl. Inst. & Meth. in Phys. Res., pp. 81–86, September 1996. http://root.cern.ch. J. O. Coplien, “Curiously recurring template pattern,” in C++ Report, pp. 24–27, February 1995. A. Alexandrescu, Modern C++ Design: Generic Programming and Design Patterns Applied. Addison-Wesley, 2001. E. Bendersky, “http://eli.thegreenplace.net/2013/12/05/the- cost- of- dynamic- virtual- calls- vs- static- crtp- dispatch- in- c/.” interesting performance evalution of CRTP. ˜ 1 de, “Expression templates revisited: A performance analysis of current methodologies,” SIAM Journal K. Iglberger, G. Hager, J. Treibig, and U. RA 4 on Scientific Computing, vol. 34, no. 2, pp. C42–C69, 2012. T. Veldhuizen, “Expression templates,” in C++ Report, vol. 5, pp. 26–31, June 1995. “Eigen.” http://eigen.tuxfamily.org/index.php?title=Main_Page. C++ library of template headers for linear algebra, matrix and vector operations, numerical solvers and related algorithms. “Blaze-lib.” http://code.google.com/p/blaze- lib/.

[3] [4] [5] [6]

[7] [8] [9]

[10] “Mtl4 - matrix template library.” http://www.simunova.com/de/node/24. [11] K. Rocki, M. Burtscher, and R. Suda, “The future of accelerator programming: Abstraction, performance or can we have both?,” 2014. http://olab.is.s.u- tokyo.ac.jp/~kamil.rocki/pub.html. [12] “Proving turing-completeness of c++ templates.” http://matt.might.net/articles/c++- template- meta- programming- with-lambda- calculus/. [13] “Template metaprogramming.” http://en.wikipedia.org/wiki/Template_metaprogramming. [14] “http://talesofcpp.fusionfenix.com/post- 12/episode- eight- the- curious- case- of- the- recurring- template- pattern.”

P. Steinbach (MPI CBG)

LightAbstractions

Dec 12nd, 2013

36 / 28