You are on page 1of 10

Memory Consumption of Objects in C++

Bence Babati Norbert Pataki


babati@caesar.elte.hu patakino@elte.hu
Faculty of Informatics, Faculty of Informatics,
Eötvös Loránd University, Eötvös Loránd University,
Budapest, Hungary Budapest, Hungary

ABSTRACT The language also provides the ability to combine built-in types
C++ focuses on effective hardware utilization. The C++ compilers and create new ones. The user-defined types can take advantage of
are able to generate efficient executable code that takes advantage of other types and provide additional functionality over them. These
the hardware elements. C++ supports the object-oriented paradigm can be defined as class or struct, the main difference between
and therefore it provides classes and structs. Compilers are enabled the two being the default visibility of the members. User-defined
to add extra padding and alignment bytes between the members of classes can provide functions related to the given type. It is an
a class. This option may result in different size of a class based on additional feature compared to built-in types since they have no
the different order of the class and contradicts the optimal memory such option.
consumption demand. In this paper, we propose an approach for The user-defined classes are categorized by the Standard based
optimizing memory utilization. This approach includes a static on how they are defined [12]. For example, aggregate classes cannot
analysis tool that examines the classes and reports if the order have private or protected members, virtual functions and base
of members is suboptimal regarding memory consumption. We classes or constructors. The behavior of language elements, like
analyze open source projects and realize every observed project initialization, may depend on this type category.
contains such a subtle problem. Another option to combine and develop new classes is inheri-
tance. A class can inherit from other classes that means the base
CCS CONCEPTS classes will be part of the derived class. Also, inheritance supports
code reuse.
• Software and its engineering → Software maintenance tools;
A weird consequence of the C++ class and object model is un-
Classes and objects; Parsers.
necessary memory consumption. Let us consider the following two
classes:
KEYWORDS
C++, static analysis, Clang, classes, objects, memory consumption class Foo {
private:
ACM Reference Format: int i;
Bence Babati and Norbert Pataki. 2022. Memory Consumption of Objects in double d;
C++. In Proceedings of ICOOOLPS’22. ACM, New York, NY, USA, 10 pages.
char c;
https://doi.org/XXXXXXX.XXXXXXX

public:
1 INTRODUCTION
The design of C++ aims to provide low memory consumption and void f() {
high hardware utilization [22]. Compilers can generate efficient // ...
code to execute which is essential in many use cases and applied }
fields.
A defining feature of C++ which has been part of the language void g() {
since the beginning are user-defined types. C++ is a statically typed // ...
programming language [16]. Therefore, the type of each variable }
must be known at compile time. For types, C++ offers different };
options. There are built-in elementary types, which are shipped
by the language. They can be used from scratch anywhere, like class FooImproved {
double or int. private:
char c;
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed int i;
for profit or commercial advantage and that copies bear this notice and the full citation double d;
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a public:
fee. Request permissions from permissions@acm.org.
ICOOOLPS’22, June 07, 2022, Berlin, Germany void f() {
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 // ...
https://doi.org/XXXXXXX.XXXXXXX }
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki

Foo

i: 4 bytes padding: 4 bytes d: 8 bytes c:1 padding: 7 bytes

FooImproved

c:1 padding: 3 bytes i: 4 bytes d: 8 bytes

Figure 1: Memory layout for objects of Foo and FooImproved

The compiler must calculate the aggregate size and alignment.


void g() { It includes every member and base class, also polymorphism may
// ... affect the calculation. The compiler organizes the members to be in
} the optimal memory position.
}; The organization includes extra unused bytes to be inserted
The Foo and FooImproved classes express the same the idea between members which is called padding. From the performance
regarding the object-oriented paradigm, but the memory consump- point of view, it is important to meet every alignment requirement,
tion of FooImproved objects differ from the memory consump- since unaligned member access may decrease performance.
tion of Foo objects. Every Foo object takes 24 bytes, but every Here we can see another example which is demonstrating a
FooImproved object takes only 16 bytes because of the extra padding simple usage. The FooBar user-defined class has an int member
bytes between the members. C++ is a resource-oriented language, which comes first with size of 4 bytes. The second member double
however, this approach may cause unnecessary memory consump- has the alignment requirement of 8. To meet the requirement, the
tion. Figure 1 presents the memory layout for objects of these types. compiler must insert additional bytes into the class layout to fill
In this paper, we argue for an approach to avoid unnecessary up to the 8th byte. After member x, there will be 4 bytes of empty
memory consumption of objects that comes from a suboptimal order memory and the member y is placed just after. At the end, the size
of class members. The proposed approach includes the development of FooBar will be 16 which includes 4 empty bytes. In this example,
of a static analysis tool which is looking for suspicious classes and the 4 bytes padding cannot be avoided, because in case of members
is then providing fix-it hints. This kind of optimization of object- are swapped, the 4 bytes padding is inserted at the end of the class.
oriented C++ code is essential because of the language philosophy.
This paper is organized as follows. Section 2 introduces the basic struct FooBar {
concepts behind C++ objects’ memory layout. However, many other int x; // size 4, alignment 4
things affect the calculation, they will be explained in Section 4. The double y; // size 8, alignment 8
related tooling options are presented and summarized in Section };
3. After that, our static analysis tool is presented with technical However, there are use cases where the paddings can be pre-
details in Section 5. In Section 6, the testing results are presented vented by reordering members properly, examples will be presented
which are measured on open source projects. At the end of this later. On the other hand, the reordering of members may have neg-
paper, it concludes with a short summary in Section 7. ative CPU cache effect, if the members which are used at the same
time, are far apart.
2 ISSUE PRELUDE
Using user-defined types may raise questions about their layout.
As it was previously stated, C++ is a statically typed language. The 3 RELATED WORK
types of variables and objects with their size and alignment are To address the problem of padding, the best-known compilers have
known at compilation time. Each type has these two properties. some related attributes which could be applied on types. An at-
The size of type is a number in bytes which means how much tribute is an extra specifier which could bring another meaning to
memory the given type will take at each allocation. The type size the given type. They can be specified with the __attribute__((name))
must be calculated at compile time since it is necessary to know the syntax.
size for every type due to the memory allocations. A related concept There are two interesting attributes, the first of them is packed
is the alignment which is calculated similarly at compile time for [10]. It tells the compiler to generate code for the given class which
each class. The alignment is a number which is a power of two, it does not contain any padding bytes. It could save up a few bytes
decides where the given type should be allocated at memory. An of memory, but the members might be placed on unaligned mem-
object must be placed on a memory address that is a multiple of its ory chunks. Using this attribute the users should make a decision
alignment. Objects should be placed on aligned memory addresses whether memory usage or performance matter most, They can pick
in the memory to get optimal performance. only one, since performance unaligned member access could be
Memory Consumption of Objects in C++ ICOOOLPS’22, June 07, 2022, Berlin, Germany

reduced based on the context in certain environments [20]. This 4.1 General
attribute only could be applied on aggregates, enums and unions. As the size and alignment definition were described in the previ-
The behavior is presented in the following example. The original ous section, we will continue with the details of the calculation.
struct PackedStruct without any attributes, has a size of 16 bytes Furthermore, alignment affects the class size calculation, so it is
and alignment of 8 bytes. However, using packed attribute on the beneficial to take a look at this mechanism.
struct, the size will be 12, since no padding was inserted at all. This The size and alignment calculation can be divided into two
means the double member is probably placed on an unaligned groups, for built-in and user-defined types. As it can be seen, it has
memory chunk. The interesting part is that, the int member has rules and is mostly controlled by the Standard. Size and alignment
the same problem since the alignment of this struct is 1 byte. of default types are platform dependent, but are fixed on the given
struct PackedStruct { platform and compiler. The built-in types’ size and alignment are
int x; taken into account while calculating for each custom class.
double y; The second and more interesting part here is the custom types’
} __attribute__((packed)); size and alignment calculation, which is a bit more complex. It is an
important point for the later analysis, so let us take a look at it. There
The packed attribute mostly disables every alignment mecha- are many things which can affect a class size and alignment. These
nism. In order to solve this issue at least, another attribute could be important concepts are illustrated through the next few examples.
applied on, called aligned [10]. This attribute will set the alignment First of all, the members of the given class are used to calculate
value of the given class. size and alignment. This is the base point of the calculation and
For example, applying the __attribute((aligned(4)))__ at- will be extended with more cases. Each member has a size and
tribute on packed structs could change the type alignment. It results alignment, hence the compiler must adjust the class memory layout
in that at least a few members may be placed on aligned memory to meet those requirements. For example, a double required to be
segments and padding at the end of the struct will inserted. placed on an address which is a multiple of 8, the compiler must
An alternative for aligned attribute is the alignas specifier fulfill in order to generate efficient code. This kind of requirement
which were introduced in the C++11 Standard version. This speci- affects the size of user-defined classes. The default size calculation is
fier can be applied on struct/class/enum/union definitions and on sum of the members’ size. However, to comply with the alignment
member declarations as well. It means, the alignas can be used to requirements, padding bytes must be inserted which increases the
define the alignment of a type, but it can change the alignment a calculated size.
single variable at its definition place. The alignment calculation is a bit simpler, the user-defined class
alignment is the maximum alignment of its members. These rules
struct alignas(4) PackedStruct2 {
cover most of the cases, however, there will be some corner cases,
int a;
which are presented in the later examples.
double b;
In the first example, the alignment of Test0 is 4, because the
} __attribute__((packed));
maximum alignment is 4 for int. The size of this class is 16. The
For the proper usage of the previous two attributes, there is a size of members is 10 bytes, two times 4 bytes and 1 byte, but there
Clang Tidy checker[17]. It validates their use and offers fixes in are empty bytes inserted. It includes 3 bytes padding two times,
case of misuse. The fixes include inserting and removing attributes once after member y, because the alignment of int is 4, so the
from struct definitions. remaining bytes after a 1 byte sized char must be padded. Also, at
Also, using these attributes is not possible in every case, since the end of class, 3 bytes padding must be inserted to fill up to the
they have major drawbacks. Just to mention unaligned memory ac- class alignment.
cess for members. Therefore, it cannot handle complex C++ classes struct Test0 {
with polymorphism and their dependencies. It needs to be stated int x;
that these attributes are very useful in given environments. This char y;
paper proposes an alternative way to solve the presented issue. int x2;
In the old Symbian C++ environment, the size of objects may char y2;
depend on the order of members [8]. However, no tool is suggested };
to realize the unnecessary memory consumption.
Memory consumption can be a problem in partial differential static_assert(sizeof(Test0) == 16);
equation solvers, as well. A precompiler is proposed which trans- static_assert(alignof(Test0) == 4);
forms the C++ code for low memory requirement. It affects the Members’ alignment can be manually set by using the alignas
floating point calculation by reducing the number of bit and takes specifier. This construct affects the calculation in the same way as
advantage of range specifications [7]. it is presented previously. The class size is padded to maximum
alignment which is 16 in this case.
4 TYPE SIZE CALCULATION struct Test9 {
In order to understand the given issue properly, we will go through alignas(16) char z[16];
the size and alignment terminology and calculation methodology int x;
to see what and how could affect the calculated result. };
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki

struct Test8b {
static_assert(sizeof(Test9) == 32); int x;
static_assert(alignof(Test9) == 16); };

4.2 Inheritance struct Test8b2 {


double x;
Inheritance can also affect the class size. Let us start with a simple
};
example. The derived class inherits members of the base class which
technically means that the base class will be included in the derived
struct Test8b3 {
object too. That implies the derived class size is increased with the
char x;
base class size in addition to the members.
};
The Test1b class is 4 bytes and the derived Test1 class size is
16. 8 bytes for y and 4 bytes for the base class. In case of members
struct Test8 : Test8b3, Test8b2, Test8b {
do not fit after bases, the padding bytes could be inserted there as
double y;
well. So, 4 bytes padding is included in this example after the base
};
to fulfill member requirements.
From the alignment point of view, the alignment of base classes
struct Test82 : Test8b2, Test8b, Test8b3 {
are also taken into account. That means if the base class alignment
double y;
is greater than the members’ alignment, then it will be used.
};
struct Test1b {
int x; static_assert(sizeof(Test8b) == 4);
}; static_assert(sizeof(Test8b2) == 8);
static_assert(sizeof(Test8b3) == 1);
struct Test1 : Test1b { static_assert(sizeof(Test8) == 32);
double y; static_assert(sizeof(Test82) == 24);
};
4.3 Polymorphism
static_assert(sizeof(Test1b) == 4);
An other concept which plays an important role is polymorphism.
static_assert(sizeof(Test1) == 16);
A class can be used in a polymorphic way, if it has at least one
Using base classes, there are a few exceptions to the previously virtual function. It means the class will have virtual table and the
presented mechanism. One of them is the empty base class. In C++, allocated objects must include a pointer to that table. The size of
every allocated object must have a memory address that means this pointer depends on the platform, it takes 8 bytes on 64 bit
empty classes, with no members, will have a size of 1 byte. Al- systems. This is added to the classes in certain cases.
though, this rule is fulfilled with non empty derived classes, so the Also, the virtual table pointer is taken into account while calcu-
automatically added 1 byte base class size can be excluded. For ex- lating alignment. That means this hidden pointer can increase the
ample, the Test2b class size is 1 byte but the derived class Test2b alignment requirement of the class.
is 4 bytes only, excluding the base’s 1 byte. The first class in the inheritance which introduces a virtual
struct Test2b {}; function, will have virtual table pointer included in their objects.
struct Test2 : Test2b { Here we have an example, where Test4b has no virtual function,
int x; but the derived class does. Therefore, the size of the virtual table
}; pointer is added to the derived Test4 class.
struct Test4b {
static_assert(sizeof(Test2b) == 1); int x;
static_assert(sizeof(Test2) == 4); };
In case of multiple inheritance, it is advised to pay attention to
sizes and alignments, just like in case of members. The layout of the struct Test4 : Test4b {
base classes behaves similarly to member layout calculation. The int z;
alignment must be taken into account, padding is inserted where virtual ~Test4() = default;
necessary. Therefore, changing the order of the base classes may };
result in a derived class with a different size.
In the following example, there are 3 base classes and 2 derived static_assert(sizeof(Test4b) == 4);
classes. In the inherited classes, the order of base classes is not the static_assert(alignof(Test4b) == 4);
same. As it can be seen, Test8 has size 32 bytes which includes 7 + static_assert(sizeof(Test4) == 16);
4 bytes padding due base class order. However, the other derived static_assert(alignof(Test4) == 8);
Test82 is only 24 bytes long because 8 bytes padding can be avoided The case is different when the base class is polymorphic. The
by reordering base classes in the inheritance. Test5b has a virtual function hence a virtual table pointer, as well.
Memory Consumption of Objects in C++ ICOOOLPS’22, June 07, 2022, Berlin, Germany

However, no extra virtual table pointer should be included in the Virtual bases also add another layer in addition to polymorphic
derived class Test5, as it can be seen on its size, which is 24 bytes. classes. A virtual base is a base which is allocated exactly once in
struct Test5b { an inheritance, in order to avoid duplication. For instance, diamond
int x; shaped inheritance can be solved with this approach.
virtual ~Test5b() = default; A virtual base means that, the most derived class must initialize
}; it, in other words, must call the base’s constructor. In the derived
classes, it must be known where the virtual base is placed in mem-
struct Test5 : Test5b { ory, so another indirection will be used in the derived class. This
double z; approach is compiler dependent, but it can usually be solved with
virtual ~Test5() = default; an extra pointer.
}; In this example, Test6 is virtually inherited from Test6b. In
addition to base classes and members, another pointer size is added
static_assert(sizeof(Test5b) == 16); to the class size.
static_assert(alignof(Test5b) == 8); struct Test6b {
static_assert(sizeof(Test5) == 24); double x;
static_assert(alignof(Test5) == 8); virtual ~Test6b() = default;
};
Still staying with polymorphic classes, an interesting topic is
padding calculation. When the base class’s alignment is decided
struct Test6 : virtual Test6b {
by the virtual table pointer, the added padding can be omitted in
double z;
the derived. The Test3b has alignment of 8 and size of 16, which
};
includes 4 bytes padding. However, in the derived class Test3, this
padding is avoided, and the derived member z is allocated there.
static_assert(sizeof(Test6b) == 16);
So, the derived and base class sizes are the same, even though the
static_assert(alignof(Test6b) == 8);
derived contains a new member.
static_assert(sizeof(Test6) == 32);
struct Test3b { static_assert(alignof(Test6) == 8);
int x;
virtual ~Test3b() = default; 5 PROPOSED APPROACH
};
The previously presented issue does not cause any real time issues
directly. However, in memory critical systems, padding is a source
struct Test3 : Test3b {
of wasted memory which is easily preventable. The good thing
int z;
about this issue, that every information is available at compile time
};
which is necessary to evaluate user-defined types. By utilizing this
property, static analysis techniques properly fit this need.
static_assert(sizeof(Test3b) == 16);
Static analysis is a software analysis method which uses only
static_assert(alignof(Test3b) == 8);
compile time information and does not require runtime data at
static_assert(sizeof(Test3) == 16);
all [1]. This approach could be implemented in many ways, like
static_assert(alignof(Test3) == 8);
using the source code or the generated byte code in certain cases.
The interesting point here is that this padding cannot be omitted It is a collective concept for the specific techniques, many static
in case of normal members. As it can be seen, 4 bytes are included analysis methods exist [9]. Their wide use comes from the compile
at the end of Test3b1 classes, but it stands in the derived class too. time behavior, because a runtime environment is often hard to
That means the derived Test31’s size is bigger than its base, plus create especially for a big scale corporate software. By analyzing
this class includes more 4 bytes padding. only the source of the software can result many important bugs to be
struct Test3b1 { found usually on a low cost. These techniques could be very useful
double y; in real world projects and prevent bugs to appear in production
int x; environment.
};
5.1 Technical background
struct Test31 : Test3b1 { The aim is to create a tool which can fully cover the size and align-
int z; ment calculation for each user-defined types. It requires to be fully
}; familiar with compiler calculation methods which mainly comes
from the C++ Standard. On the other hand, this is an interesting
static_assert(sizeof(Test3b1) == 16); optimization problem which can be solved by only seeing the type
static_assert(alignof(Test3b1) == 8); definitions including members and base classes.
static_assert(sizeof(Test31) == 24); This goal was achieved with a specific static analysis tool, which
static_assert(alignof(Test31) == 8); has been created for this purpose. The implementation behaves like
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki

static analyzers do and using only compile time information. The chosen for performance reasons, because rewriting the AST to try
proposed tool is using the abstract syntax tree and its visitors [14]. every possible permutation is very costly. It would be a factorial
Our tool is built on Clang libraries by taking advantage of its complexity task, which takes too much time after a few members.
modular architecture and reusable components [15]. Clang is a The manual solution could be affected by this growing complex-
C/C++/Objective-C compiler, which includes many extra tools, for ity problem, but an optimization algorithm has been written which
instance, a sophisticated static analyzer. Clang is part of the LLVM does not try all the options, but places members in an optimized
compiler infrastructure [18]. Clang is mainly developed by the way. Also note that, the algorithm works based on heuristics and
community as an open source project, although, there are many the global optimum is not proven formally.
contributors from technology companies.
The architecture of Clang compiler and tools are modular. Clang
is well designed by modularizing different domain parts, like mov-
ing the source tokenizer into a different library which can be used
in multiple places. This compiler infrastructure provides many li- 5.3 Optimal size calculation
braries for different features with a well defined API. These libraries The selected algorithm costs much less than calculating all the
are used in the tools shipped by Clang, also could be utilized by permutations. First of all, base class placement should be calculated
third party developers in custom tools. because it is part of the derived class. It is necessary to know where
This modularity and reusability makes Clang acknowledged members could start, as it was seen before, the base classes could
among enthusiasts to create custom analyzers [5]. Clang’s archi- affect the size in different ways. The first byte should be calculated
tecture makes the developers life easier, because they can focus on to properly follow up with members.
their high level task only and don’t have to take care of low level The second step is to eliminate all members which size is multi-
tasks nor track C++ Standard changes [6]. plier of the class alignment. For example, if the class alignment is
The created tool is using Clang libraries for common parsing 8, then every member with size 8 can be eliminated. It is feasible
tasks, like tokenizing or creating abstract syntax tree (AST) from because they cannot create padding holes if they were placed after
the source code. In this tool, the AST is used as the main data source. each other. Usually, this excludes a lot of members.
It contains all the information necessary to make proper decisions For his purpose the class alignment needs to be known. However,
and suggestions about custom types. For the depicted problem, the the alignment calculation is different than the size computation,
class definitions need to be known including members and base because reordering members cannot change the alignment. There-
classes. Clang builds up the AST from the source code and via the fore, the original type’s alignment can be used in the analysis and
AST visitor interface, our tool is able to extract everything it needs. it is not needed to recalculate it.
The next step is to order the members by size and alignment.
The members will be tried to be placed after each other, but during
5.2 Analysis overview the placement the alignments are taken into account. In case of a
In order to provide reliable results, the analysis needs to reproduce padding is found, it is necessary to gather other members to fill
the original build environment. It means that the original build in. The most simple case when there is a member which has the
parameters are necessary to be pushed to the analyzer tool as exactly same size as the padding. It just needs to be moved to the
well. For example, a missing macro definition can give a fully new padding, and continue the procedure on the ordered member list.
meaning to some code parts because for example, another ifdef The case is almost the same, when multiple members have a sum-
branch would be chosen. It could result in the analyzer sees code marized size which equals to the padding. They will be moved ahead
that is different from the one that is running in production and that and the iteration on the remaining members could be continued.
could lead to false positive and negative hits as well. The most challenging part when there are no members which
With the original build environment, the proposed tool can start properly fit the padding. In this case, one or more bytes will be
the analysis. The low level tasks are handled by Clang, like tokeniz- still empty. In order to avoid local optimum by using members
ing, parsing, compiler parameter handling. The tool joins to the which could fill another padding later, the largest size combinations
analysis after the abstract syntax tree is successfully created. This should be selected. For example, if the padding is 8 bytes and there
AST is visited by the analyzer. are equally aligned members, one with size of 6 bytes and two with
The analysis searches the AST for user-defined classes and structs. 3 bytes size. The member with size 6 should be selected, because
These are the points where the presented issues could happen. Only the smaller members could fill a smaller padding later.
the defined types are interesting, forward declarations are skipped, After reaching the end of the members, all of them are placed, the
since not enough data is available for them. padding at the end of class must be added. At this point, the poly-
For each class and struct, the members and base classes are col- morphic factor should be taken into account for the size calculation
lected. Each of them must have size and alignment already defined, too. As it can be seen before, polymorphism can increase the class
because it must be known at compile time. These parameters are size in given cases, like having a virtual method or a polymorphic
used to calculate an optimized size. base class.
There were two possibilities to calculate new class sizes. It can The Figure 2 shows a formalized pseudocode of the previously
be calculated by the rewriting the AST multiple times and let the depicted algorithm. The main hierarchy of the logic was highlighted
compiler calculate the sizes for us. Or it can be calculated manually, and not every functions are detailed in order to give a better under-
by taking the members and the base classes. The latter has been standing how the tool does its job.
Memory Consumption of Objects in C++ ICOOOLPS’22, June 07, 2022, Berlin, Germany

procedure SetupBases(bases) 5.4 Resulting output


for base ∈bases do After the available information is taken into consideration, a new
if ¬(Aligned(base)) then class size will be calculated. It will be compared to the original
AddPadding(base) class size and in case the new one is smaller, the tool can emit a
end if compiler-like warning. This warning is placed at the class definition
PlaceObject(base) and contains how many bytes could be saved and how the members
end for should be reordered for that.
end procedure
test.cpp:23:1: warning: Size can be reduced:
procedure EliminateAlignedMembers(members, maxalign) 24 -> 16 bytes by reordering
for member ∈members do members: _name,_id,_type
if maxaliдn = member .aliдn then struct Dummy {
RemoveMember(member , members)
end if 5.5 Corner cases
end for The previously presented logic can be applied most of the cases.
end procedure However, there are a few corner cases where another extensions
are necessary.
procedure FillPadding(members, len)
selected ← SelectMembers(members, len) 5.5.1 Member dependencies. The class members initialization hap-
pens in the order of the member declarations. The constructor
for member ∈selected do initializer list ordering does not affect the initialization order. By
if ¬(Aligned(member)) then not taking care of this rule, undefined behavior could happen. For
AddPadding(member ) example, _name is initialized before _id because it is declared ear-
end if lier. The order of declarations could not be changed, because the
PlaceObject(member ) initialization of the _id uses the _name member. Otherwise, it is an
end for undefined behavior, the size() member function call may lead to
end procedure a segmentation fault or just returns a random number.

procedure SizeCalculationLogic(class) class Token {


maxaliдn ← GetMaxAlignment(class) public:
SetupBases(class.bases) explicit Token(const char* name, const int type)
EliminateAlignedMembers(class.members) : _type(type)
, _name(name)
for member ∈class.members do , _id(_name.size())
paddinд ← GetPadding(member ) { }

if paddinд , 0 then private:


FillPadding(members, paddinд) int _type = 0;
end if std::string _name;
PlaceObject(member ) int _id = 0;
end for };
This is a tricky case, newer compilers can warn about these kinds
if ¬isPolymorphic (class.bases) ∧ of initialization problems. However, it should be handled in our
hasV irtual (class.f unctions) then tool in order to provide a reliable result at the end of the analysis.
AddVirtualPtr( ) A possible optimization choice here, is to move _id before _name,
end if but it could not be feasible. It results in an undefined behavior that
the users would not expect from an analyzer.
if hasV irtual (class.bases) then In order to handle these cases, the tool should analyze the con-
AddVirtualBasePtr( ) structor definitions as well. Although, it is not always available at
end if the point of the class definition, for example, it could be defined
end procedure in another translation unit. The tool cannot analyze them when
constructors are defined elsewhere, but it is a Clang limitation [13].
Figure 2: Pseudocode for the Optimal Size Calculation Logic If the definitions are available, they will be analyzed separately
to construct a dependency graph, which describes the ordering re-
strictions based on constructor initialization lists. This dependency
graph is used in the previously depicted size calculation algorithm
by disabling given member rearrangements.
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki

5.5.2 Template instantiations. Another interesting case is template Table 1: Analyzed classes on Avro project
instantiations and template parameters. Let us see an example
where the class layout depends on the template parameter T. If T is Columns mean the original and the modified source code state. Rows mean
a double, which is 8 bytes, there will be 4 bytes padding after x and the number of analyzed classes, classes which use STL and template classes.
another 4 bytes after y. If it is an int, there will be 0 bytes padding,
in case of 4 bytes int. original optimized
classes 186 4
template<class T>
struct Dummy { classes with STL 77 2
int x; template 86 3
T t;
int y;
}; fast response on each implemented feature, however, to comprehen-
sively validate the provided results of the tool, it should be tested
In case of double, the tool rightfully can suggest to reorder
on open source projects.
members, but it is not that simple, because it affects the other
The selected projects should cover as many as possible C++
instantiations as well.
features which affect the size calculation. The following four open
Take a look at the next example, where the layout depends on
source projects were selected:
template parameters only and no fixed members are written.
• Avro data serialization format and framework [2], loc 20740
template<class T1, class T2, class T3>
total
struct Dummy3 {
• Flatbuffers data serialization format and framework [11],
T1 first;
loc 101726 total
T2 second;
• RapidJSON header-only library for JSON parsing and gener-
T3 third;
ating [23], loc 39353 total
};
• Thrift RPC data transport and serialization framework [3],
By instantiating Dummy3 class with Dummy3<double, int, int>, loc 129932 total
the layout is good, no padding included. However, using Dummy3<int, The showed lines of code metrics are calculated for the C++
double, int> results in 8 bytes padding, which can be solved by source files in each project, so the calculation excludes everything
reordering template parameters which are literally the members. else from the results. These codebases include many C++ program-
For the record, it relates std::tuple usage, which can result the ming techniques from inheritance to templates, hence the feature
same issue. testing is comprehensive. For testing purposes, the previously pre-
Both cases are possible and valid, but hard to handle both of sented warning is extended with additional logging for each seen
them properly. One of them gives false positives, since members class or struct. This debug record contains information which is
cannot be reordered in just one instantiation easily. In case the tool useful for calculating statistics, like class name, size, alignment and
does not report them, the other one give, false negatives. It can so on.
be minimized by checking, if all members are templates, then the
reordering can be performed by changing template parameters. {
"record": "Dummy",
5.5.3 C++ Standard Template Library. C++ Standard Template Li- "align": 8,
brary (STL) provides useful data structures (e.g. std::list) and "size": 40,
algorithms (e.g. std::count_if) for the efficient development [4]. "fields": 1,
STL takes advantage of the template construct, thus it works to- "stl": true,
gether with user-defined types tightly [21]. However, STL is a stan- "template": false,
dardized library but many different implementations are available "optimized": 40,
[19]. The STL containers use heap memory for storing the elements "filename": "/path/to/tests.cpp"
which is essential regarding the unnecessary memory consumption. }
Two drawbacks can be mentioned. Usage of STL may result in false The records are formatted in json in order to generate statistics
positives because different library implementations are available. later automatically. Although, it does not mean that the results
The previously mentioned problem of template instantiation is also were not checked. Every uniquely printed debug json record was
involved, for instance, in case of nodes of the std::list. validated manually, by checking the class definition, its members,
base classes and so on. All of the calculated sizes were validated,
6 VALIDATION also the records were checked whether they can be reordered or
Various tests were necessary to validate the implemented logic and not. It took a lot of time, although it was the only way to validate
its usability on projects. The tests should cover almost all of the false positive and false negative results as well.
important cases which occurs in everyday code. It is important to mention that classes are judged with the same
The validation phase had two stages. The first of them included weight. This means that the test classes or class usage frequency or
handcrafted, unit test like testcases, where every general and corner memory critical parts were not differentiated, because this approach
cases were checked. It is useful for the development phase to get requires deep knowledge on each project.
Memory Consumption of Objects in C++ ICOOOLPS’22, June 07, 2022, Berlin, Germany

Table 2: Analyzed classes on Flatbuffers project Table 5: Optimized classes on heap

Columns mean the original and the modified source code state. Rows report Columns mean the number of optimized classes, how many of them were
the number of analyzed classes, classes which use STL and template classes. used with STL container or allocted directly on the heap. The rows denote
the analyzed projects.

original optimized
classes 368 7 optimized STL container direct heap
classes with STL 52 4 allocation allocation
template 215 0 Avro 4 0 2
Flatbuffers 7 0 1
RapidJSON 3 1 2
Table 3: Analyzed classes on RapidJSON project Thrift 13 0 3

Columns denote the original and the modified source code state. Rows report Table 6: Optimized layout properties in bytes
the count of analyzed classes, classes which use STL and template classes.

The columns denote the minimum, maximum, average and median class size
original optimized differences. The rows report the analyed projects.
classes 98 3
classes with STL 11 1 min max avg median
template 68 2 Avro 8 8 8.0 8
Flatbuffers 8 48 13.714 8
RapidJSON 8 16 13.34 16
Table 4: Analyzed classes on Thrift project
Thrift 8 16 9.23 8
Columns mean the original and the modified source code state. Rows report
the count of analyzed classes, classes which use STL and template classes. Table 7: Class details in Avro

original optimized This table shows the size of the classes in bytes and in the number of mem-
classes 54 13 bers.
classes with STL 45 13
template 1 0 min max avg median
Class size 4 2840 76.022 24
Member count 0 6 0.423 0
The evaluation results were divided into multiple groups. The
Table 1, Table 2, Table 3 and Table 4 show information about classes. Table 8: Class details in Flatbuffers
The first row called classes is the number of classes which were
analyzed, the second row called classes with STL is the number This table shows the size of the classes in bytes and in the number of mem-
of classes which includes any kind STL related data structure and bers.
the third row shows that how many of the analysed classes is
a template specialization. As it was previously described, each min max avg median
different template instantiation is analyzed independently. The Class size 1 524280 7530.4918 8
first column of the table is the original repository state, and the Member count 0 48 1.089 1
second row shows that how many of them could be optimized.
The numbers show that the projects are actively using data struc-
tures from STL and class templates too. Class layout optimization is the indirect usages, where the class is used within a class which is
possible in every project for some classes. Also, the result includes allocated on the heap, may increase those numbers.
template classes too. Important note, there were a few false posi- By going more into the details, the Table 6 depicts a summary on
tive results for template specializations, which not included in the the optimized classes. The columns mean the minimum, maximum,
presented tables. The tool was right by reordering members really average and median bytes of the original and optimized size differ-
frees up some memory, however, it cannot be done only for the ences. For example, on the Thrift project, the minimum of bytes
given specialization. The both sides of this problem were described gain is 8 bytes, the largest gain is 16 bytes, the average among all
in the previous section. classes was 9.23 bytes and the median was 8 bytes. As it can be
The Table 5 shows a deeper dive in optimized layout classes. It seen, on average, 8-16 bytes can be optimized out by reordering
depicts how many classes were directly allocated on heap by using members, but in special cases, this values may go up.
raw or smart pointers and how many of them were used within a The another point in the analysis behind the bytes profit and
STL container which directly results in heap allocation. As it can be optimizations, how classes look like in size and members. The Table
seen a few of them were, which is memory waste directly. Therefore 7, Table 8, Table 9 and Table 10 show numbers about classes. The
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki

Table 9: Class details in RapidJSON REFERENCES


[1] Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2007. Compilers:
principles, techniques, & tools. Pearson Education India.
This table shows the size of the classes in bytes and in the number of mem-
[2] Apache. 2022. Avro. https://github.com/apache/avro
bers.
[3] Apache. 2022. Thrift. https://github.com/apache/thrift
[4] Matthew H. Austern. 1998. Generic Programming and the STL: Using and Extending
min max avg median the C++ Standard Template Library. Addison-Wesley Longman Publishing Co.,
Inc., Boston, MA, USA.
Class size 8 3336 115.42 32 [5] Bence Babati, Gábor Horváth, Viktor Májer, and Norbert Pataki. 2017. Static
Member count 0 22 0.89 0 Analysis Toolset with Clang. In Proceedings of the 10th International Conference
on Applied Informatics. 23–29.
[6] Tibor Brunner, Norbert Pataki, and Zoltán Porkoláb. 2016. Backward compatibil-
ity violations and their detection in C++ legacy code using static analysis. Acta
Electrotechnica et Informatica 16, 2 (2016), 12–19.
Table 10: Class details in Thrift [7] H.-J. Bungartz, W. Eckhardt, T. Weinzierl, and C. Zenger. 2010. A precompiler to
reduce the memory footprint of multiscale PDE solvers in C++. Future Generation
Computer Systems 26, 1 (2010), 175–182. https://doi.org/10.1016/j.future.2009.05.
This table shows the size of the classes in bytes and in the number of mem- 011
bers. [8] Fadi Chehimi, Paul Coulton, and Reuben Edwards. 2006. C++ Optimizations
for Mobile Applications. In 2006 IEEE International Symposium on Consumer
Electronics. 1–6. https://doi.org/10.1109/ISCE.2006.1689506
min max avg median [9] Patrick Cousot. 1996. Abstract interpretation. ACM Computing Surveys (CSUR)
Class size 8 4800 884.15 624 28, 2 (1996), 324–328.
[10] the GNU Compiler Collection GCC. 2022. Using the GNU compiler collection - Type
Member count 0 14 2.35 1 Attributes. https://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Type-Attributes.html
[11] Google. 2022. Flatbuffers. https://github.com/google/flatbuffers
[12] Gábor Horváth and Norbert Pataki. 2019. Categorization of C++ Classes for Static
Lifetime Analysis. In Proceedings of the 9th Balkan Conference on Informatics (Sofia,
Bulgaria) (BCI’19). Association for Computing Machinery, New York, NY, USA,
first row details the original class sizes in bytes, including minimum, Article 21, 7 pages. https://doi.org/10.1145/3351556.3351559
[13] Gábor Horváth, Péter Szécsi, Zoltán Gera, Dániel Krupp, and Norbert Pataki.
maximum, average and median. The second row shows how many 2018. Challenges of Implementing Cross Translation Unit Analysis in Clang
members were found in these classes. These details are not tightly Static Analyzer. In 2018 IEEE 18th International Working Conference on Source
related to the optimization results, however, an interesting point of Code Analysis and Manipulation (SCAM). IEEE, 171–176.
[14] Joel Jones. 2003. Abstract syntax tree implementation idioms. In Proceedings of
view to see characteristics numbers in each project about classes. the 10th conference on pattern languages of programs (plop2003). 1–10.
[15] Chris Lattner. 2008. LLVM and Clang: Next Generation Compiler Technology.
Lecture at BSD Conference 2008.
7 CONCLUSION [16] Stanley B. Lippman. 1996. Inside the C++ Object Model. Addison-Wesley Profes-
sional.
C++ provides constructs to implement user-defined classes with [17] Clang LLVM. 2022. Clang-Tidy - altera-struct-pack-align. https://clang.llvm.org/
custom layouts. Each of these classes could have custom size and extra/clang-tidy/checks/altera-struct-pack-align.html
alignment based on their properties like members or base classes. [18] Bruno Cardoso Lopes and Rafael Auler. 2014. Getting Started with LLVM Core
Libraries. Packt Publishing.
By defining custom classes, an interesting phenomenon could be [19] Scott Meyers. 2001. Effective STL. Addison-Wesley.
committed. It is not a runtime or compile time issue which cause [20] Asadollah Shahbahrami, Ben Juurlink, and Stamatis Vassiliadis. 2006. Perfor-
any problem. However, this subtle problem implies an increased mance impact of misaligned accesses in SIMD extensions. In Proceedings of the
17th Annual Workshop on Circuits, Systems and Signal Processing. 334–342.
memory consumption by including unused bytes called padding [21] Alexander A. Stepanov and Daniel E. Rose. 2014. From mathematics to generic
in the class layout in order to fill up the holes to properly align programming. Pearson Education.
[22] Bjarne Stroustrup. 2013. The C++ Programming Language (4th ed.). Addison-
members, which is important from the CPU point of view. Wesley Professional.
The included padding only depends on the order of members and [23] Tencent. 2022. RapidJSON. https://github.com/Tencent/rapidjson
base classes. As a consequence, by reordering the members, a lot
of junk memory space could be spared. In general, it could be said
that wasting memory in any way is not a recommended behavior.
Moreover, this approach requires more attention in systems where
the memory consumption is a critical point.
In this paper, we have walked through how the user-defined class
layout size and alignment could be calculated. After that, a static
analysis tool was proposed which can detect the presented memory
wasting by paying attention to user-defined types’ contexts. At the
end of the analysis, our tool can suggest a viable alternative layout
which could save a few bytes.
The developed tool has been tested on large open source code-
bases. As it could be seen, objects’ unnecessary memory consump-
tion is a general headache, since no one can always calculate class
size in the head. A few hits have been revealed, their environment
were checked and summarized results were presented. These tests
show promising results from the static analysis tool, also there
could be some future work to provide a production ready tooling.

You might also like