Professional Documents
Culture Documents
ABSTRACT The language also provides the ability to combine built-in types
C++ focuses on effective hardware utilization. The C++ compilers and create new ones. The user-defined types can take advantage of
are able to generate efficient executable code that takes advantage of other types and provide additional functionality over them. These
the hardware elements. C++ supports the object-oriented paradigm can be defined as class or struct, the main difference between
and therefore it provides classes and structs. Compilers are enabled the two being the default visibility of the members. User-defined
to add extra padding and alignment bytes between the members of classes can provide functions related to the given type. It is an
a class. This option may result in different size of a class based on additional feature compared to built-in types since they have no
the different order of the class and contradicts the optimal memory such option.
consumption demand. In this paper, we propose an approach for The user-defined classes are categorized by the Standard based
optimizing memory utilization. This approach includes a static on how they are defined [12]. For example, aggregate classes cannot
analysis tool that examines the classes and reports if the order have private or protected members, virtual functions and base
of members is suboptimal regarding memory consumption. We classes or constructors. The behavior of language elements, like
analyze open source projects and realize every observed project initialization, may depend on this type category.
contains such a subtle problem. Another option to combine and develop new classes is inheri-
tance. A class can inherit from other classes that means the base
CCS CONCEPTS classes will be part of the derived class. Also, inheritance supports
code reuse.
• Software and its engineering → Software maintenance tools;
A weird consequence of the C++ class and object model is un-
Classes and objects; Parsers.
necessary memory consumption. Let us consider the following two
classes:
KEYWORDS
C++, static analysis, Clang, classes, objects, memory consumption class Foo {
private:
ACM Reference Format: int i;
Bence Babati and Norbert Pataki. 2022. Memory Consumption of Objects in double d;
C++. In Proceedings of ICOOOLPS’22. ACM, New York, NY, USA, 10 pages.
char c;
https://doi.org/XXXXXXX.XXXXXXX
public:
1 INTRODUCTION
The design of C++ aims to provide low memory consumption and void f() {
high hardware utilization [22]. Compilers can generate efficient // ...
code to execute which is essential in many use cases and applied }
fields.
A defining feature of C++ which has been part of the language void g() {
since the beginning are user-defined types. C++ is a statically typed // ...
programming language [16]. Therefore, the type of each variable }
must be known at compile time. For types, C++ offers different };
options. There are built-in elementary types, which are shipped
by the language. They can be used from scratch anywhere, like class FooImproved {
double or int. private:
char c;
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed int i;
for profit or commercial advantage and that copies bear this notice and the full citation double d;
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a public:
fee. Request permissions from permissions@acm.org.
ICOOOLPS’22, June 07, 2022, Berlin, Germany void f() {
© 2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 // ...
https://doi.org/XXXXXXX.XXXXXXX }
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki
Foo
FooImproved
reduced based on the context in certain environments [20]. This 4.1 General
attribute only could be applied on aggregates, enums and unions. As the size and alignment definition were described in the previ-
The behavior is presented in the following example. The original ous section, we will continue with the details of the calculation.
struct PackedStruct without any attributes, has a size of 16 bytes Furthermore, alignment affects the class size calculation, so it is
and alignment of 8 bytes. However, using packed attribute on the beneficial to take a look at this mechanism.
struct, the size will be 12, since no padding was inserted at all. This The size and alignment calculation can be divided into two
means the double member is probably placed on an unaligned groups, for built-in and user-defined types. As it can be seen, it has
memory chunk. The interesting part is that, the int member has rules and is mostly controlled by the Standard. Size and alignment
the same problem since the alignment of this struct is 1 byte. of default types are platform dependent, but are fixed on the given
struct PackedStruct { platform and compiler. The built-in types’ size and alignment are
int x; taken into account while calculating for each custom class.
double y; The second and more interesting part here is the custom types’
} __attribute__((packed)); size and alignment calculation, which is a bit more complex. It is an
important point for the later analysis, so let us take a look at it. There
The packed attribute mostly disables every alignment mecha- are many things which can affect a class size and alignment. These
nism. In order to solve this issue at least, another attribute could be important concepts are illustrated through the next few examples.
applied on, called aligned [10]. This attribute will set the alignment First of all, the members of the given class are used to calculate
value of the given class. size and alignment. This is the base point of the calculation and
For example, applying the __attribute((aligned(4)))__ at- will be extended with more cases. Each member has a size and
tribute on packed structs could change the type alignment. It results alignment, hence the compiler must adjust the class memory layout
in that at least a few members may be placed on aligned memory to meet those requirements. For example, a double required to be
segments and padding at the end of the struct will inserted. placed on an address which is a multiple of 8, the compiler must
An alternative for aligned attribute is the alignas specifier fulfill in order to generate efficient code. This kind of requirement
which were introduced in the C++11 Standard version. This speci- affects the size of user-defined classes. The default size calculation is
fier can be applied on struct/class/enum/union definitions and on sum of the members’ size. However, to comply with the alignment
member declarations as well. It means, the alignas can be used to requirements, padding bytes must be inserted which increases the
define the alignment of a type, but it can change the alignment a calculated size.
single variable at its definition place. The alignment calculation is a bit simpler, the user-defined class
alignment is the maximum alignment of its members. These rules
struct alignas(4) PackedStruct2 {
cover most of the cases, however, there will be some corner cases,
int a;
which are presented in the later examples.
double b;
In the first example, the alignment of Test0 is 4, because the
} __attribute__((packed));
maximum alignment is 4 for int. The size of this class is 16. The
For the proper usage of the previous two attributes, there is a size of members is 10 bytes, two times 4 bytes and 1 byte, but there
Clang Tidy checker[17]. It validates their use and offers fixes in are empty bytes inserted. It includes 3 bytes padding two times,
case of misuse. The fixes include inserting and removing attributes once after member y, because the alignment of int is 4, so the
from struct definitions. remaining bytes after a 1 byte sized char must be padded. Also, at
Also, using these attributes is not possible in every case, since the end of class, 3 bytes padding must be inserted to fill up to the
they have major drawbacks. Just to mention unaligned memory ac- class alignment.
cess for members. Therefore, it cannot handle complex C++ classes struct Test0 {
with polymorphism and their dependencies. It needs to be stated int x;
that these attributes are very useful in given environments. This char y;
paper proposes an alternative way to solve the presented issue. int x2;
In the old Symbian C++ environment, the size of objects may char y2;
depend on the order of members [8]. However, no tool is suggested };
to realize the unnecessary memory consumption.
Memory consumption can be a problem in partial differential static_assert(sizeof(Test0) == 16);
equation solvers, as well. A precompiler is proposed which trans- static_assert(alignof(Test0) == 4);
forms the C++ code for low memory requirement. It affects the Members’ alignment can be manually set by using the alignas
floating point calculation by reducing the number of bit and takes specifier. This construct affects the calculation in the same way as
advantage of range specifications [7]. it is presented previously. The class size is padded to maximum
alignment which is 16 in this case.
4 TYPE SIZE CALCULATION struct Test9 {
In order to understand the given issue properly, we will go through alignas(16) char z[16];
the size and alignment terminology and calculation methodology int x;
to see what and how could affect the calculated result. };
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki
struct Test8b {
static_assert(sizeof(Test9) == 32); int x;
static_assert(alignof(Test9) == 16); };
However, no extra virtual table pointer should be included in the Virtual bases also add another layer in addition to polymorphic
derived class Test5, as it can be seen on its size, which is 24 bytes. classes. A virtual base is a base which is allocated exactly once in
struct Test5b { an inheritance, in order to avoid duplication. For instance, diamond
int x; shaped inheritance can be solved with this approach.
virtual ~Test5b() = default; A virtual base means that, the most derived class must initialize
}; it, in other words, must call the base’s constructor. In the derived
classes, it must be known where the virtual base is placed in mem-
struct Test5 : Test5b { ory, so another indirection will be used in the derived class. This
double z; approach is compiler dependent, but it can usually be solved with
virtual ~Test5() = default; an extra pointer.
}; In this example, Test6 is virtually inherited from Test6b. In
addition to base classes and members, another pointer size is added
static_assert(sizeof(Test5b) == 16); to the class size.
static_assert(alignof(Test5b) == 8); struct Test6b {
static_assert(sizeof(Test5) == 24); double x;
static_assert(alignof(Test5) == 8); virtual ~Test6b() = default;
};
Still staying with polymorphic classes, an interesting topic is
padding calculation. When the base class’s alignment is decided
struct Test6 : virtual Test6b {
by the virtual table pointer, the added padding can be omitted in
double z;
the derived. The Test3b has alignment of 8 and size of 16, which
};
includes 4 bytes padding. However, in the derived class Test3, this
padding is avoided, and the derived member z is allocated there.
static_assert(sizeof(Test6b) == 16);
So, the derived and base class sizes are the same, even though the
static_assert(alignof(Test6b) == 8);
derived contains a new member.
static_assert(sizeof(Test6) == 32);
struct Test3b { static_assert(alignof(Test6) == 8);
int x;
virtual ~Test3b() = default; 5 PROPOSED APPROACH
};
The previously presented issue does not cause any real time issues
directly. However, in memory critical systems, padding is a source
struct Test3 : Test3b {
of wasted memory which is easily preventable. The good thing
int z;
about this issue, that every information is available at compile time
};
which is necessary to evaluate user-defined types. By utilizing this
property, static analysis techniques properly fit this need.
static_assert(sizeof(Test3b) == 16);
Static analysis is a software analysis method which uses only
static_assert(alignof(Test3b) == 8);
compile time information and does not require runtime data at
static_assert(sizeof(Test3) == 16);
all [1]. This approach could be implemented in many ways, like
static_assert(alignof(Test3) == 8);
using the source code or the generated byte code in certain cases.
The interesting point here is that this padding cannot be omitted It is a collective concept for the specific techniques, many static
in case of normal members. As it can be seen, 4 bytes are included analysis methods exist [9]. Their wide use comes from the compile
at the end of Test3b1 classes, but it stands in the derived class too. time behavior, because a runtime environment is often hard to
That means the derived Test31’s size is bigger than its base, plus create especially for a big scale corporate software. By analyzing
this class includes more 4 bytes padding. only the source of the software can result many important bugs to be
struct Test3b1 { found usually on a low cost. These techniques could be very useful
double y; in real world projects and prevent bugs to appear in production
int x; environment.
};
5.1 Technical background
struct Test31 : Test3b1 { The aim is to create a tool which can fully cover the size and align-
int z; ment calculation for each user-defined types. It requires to be fully
}; familiar with compiler calculation methods which mainly comes
from the C++ Standard. On the other hand, this is an interesting
static_assert(sizeof(Test3b1) == 16); optimization problem which can be solved by only seeing the type
static_assert(alignof(Test3b1) == 8); definitions including members and base classes.
static_assert(sizeof(Test31) == 24); This goal was achieved with a specific static analysis tool, which
static_assert(alignof(Test31) == 8); has been created for this purpose. The implementation behaves like
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki
static analyzers do and using only compile time information. The chosen for performance reasons, because rewriting the AST to try
proposed tool is using the abstract syntax tree and its visitors [14]. every possible permutation is very costly. It would be a factorial
Our tool is built on Clang libraries by taking advantage of its complexity task, which takes too much time after a few members.
modular architecture and reusable components [15]. Clang is a The manual solution could be affected by this growing complex-
C/C++/Objective-C compiler, which includes many extra tools, for ity problem, but an optimization algorithm has been written which
instance, a sophisticated static analyzer. Clang is part of the LLVM does not try all the options, but places members in an optimized
compiler infrastructure [18]. Clang is mainly developed by the way. Also note that, the algorithm works based on heuristics and
community as an open source project, although, there are many the global optimum is not proven formally.
contributors from technology companies.
The architecture of Clang compiler and tools are modular. Clang
is well designed by modularizing different domain parts, like mov-
ing the source tokenizer into a different library which can be used
in multiple places. This compiler infrastructure provides many li- 5.3 Optimal size calculation
braries for different features with a well defined API. These libraries The selected algorithm costs much less than calculating all the
are used in the tools shipped by Clang, also could be utilized by permutations. First of all, base class placement should be calculated
third party developers in custom tools. because it is part of the derived class. It is necessary to know where
This modularity and reusability makes Clang acknowledged members could start, as it was seen before, the base classes could
among enthusiasts to create custom analyzers [5]. Clang’s archi- affect the size in different ways. The first byte should be calculated
tecture makes the developers life easier, because they can focus on to properly follow up with members.
their high level task only and don’t have to take care of low level The second step is to eliminate all members which size is multi-
tasks nor track C++ Standard changes [6]. plier of the class alignment. For example, if the class alignment is
The created tool is using Clang libraries for common parsing 8, then every member with size 8 can be eliminated. It is feasible
tasks, like tokenizing or creating abstract syntax tree (AST) from because they cannot create padding holes if they were placed after
the source code. In this tool, the AST is used as the main data source. each other. Usually, this excludes a lot of members.
It contains all the information necessary to make proper decisions For his purpose the class alignment needs to be known. However,
and suggestions about custom types. For the depicted problem, the the alignment calculation is different than the size computation,
class definitions need to be known including members and base because reordering members cannot change the alignment. There-
classes. Clang builds up the AST from the source code and via the fore, the original type’s alignment can be used in the analysis and
AST visitor interface, our tool is able to extract everything it needs. it is not needed to recalculate it.
The next step is to order the members by size and alignment.
The members will be tried to be placed after each other, but during
5.2 Analysis overview the placement the alignments are taken into account. In case of a
In order to provide reliable results, the analysis needs to reproduce padding is found, it is necessary to gather other members to fill
the original build environment. It means that the original build in. The most simple case when there is a member which has the
parameters are necessary to be pushed to the analyzer tool as exactly same size as the padding. It just needs to be moved to the
well. For example, a missing macro definition can give a fully new padding, and continue the procedure on the ordered member list.
meaning to some code parts because for example, another ifdef The case is almost the same, when multiple members have a sum-
branch would be chosen. It could result in the analyzer sees code marized size which equals to the padding. They will be moved ahead
that is different from the one that is running in production and that and the iteration on the remaining members could be continued.
could lead to false positive and negative hits as well. The most challenging part when there are no members which
With the original build environment, the proposed tool can start properly fit the padding. In this case, one or more bytes will be
the analysis. The low level tasks are handled by Clang, like tokeniz- still empty. In order to avoid local optimum by using members
ing, parsing, compiler parameter handling. The tool joins to the which could fill another padding later, the largest size combinations
analysis after the abstract syntax tree is successfully created. This should be selected. For example, if the padding is 8 bytes and there
AST is visited by the analyzer. are equally aligned members, one with size of 6 bytes and two with
The analysis searches the AST for user-defined classes and structs. 3 bytes size. The member with size 6 should be selected, because
These are the points where the presented issues could happen. Only the smaller members could fill a smaller padding later.
the defined types are interesting, forward declarations are skipped, After reaching the end of the members, all of them are placed, the
since not enough data is available for them. padding at the end of class must be added. At this point, the poly-
For each class and struct, the members and base classes are col- morphic factor should be taken into account for the size calculation
lected. Each of them must have size and alignment already defined, too. As it can be seen before, polymorphism can increase the class
because it must be known at compile time. These parameters are size in given cases, like having a virtual method or a polymorphic
used to calculate an optimized size. base class.
There were two possibilities to calculate new class sizes. It can The Figure 2 shows a formalized pseudocode of the previously
be calculated by the rewriting the AST multiple times and let the depicted algorithm. The main hierarchy of the logic was highlighted
compiler calculate the sizes for us. Or it can be calculated manually, and not every functions are detailed in order to give a better under-
by taking the members and the base classes. The latter has been standing how the tool does its job.
Memory Consumption of Objects in C++ ICOOOLPS’22, June 07, 2022, Berlin, Germany
5.5.2 Template instantiations. Another interesting case is template Table 1: Analyzed classes on Avro project
instantiations and template parameters. Let us see an example
where the class layout depends on the template parameter T. If T is Columns mean the original and the modified source code state. Rows mean
a double, which is 8 bytes, there will be 4 bytes padding after x and the number of analyzed classes, classes which use STL and template classes.
another 4 bytes after y. If it is an int, there will be 0 bytes padding,
in case of 4 bytes int. original optimized
classes 186 4
template<class T>
struct Dummy { classes with STL 77 2
int x; template 86 3
T t;
int y;
}; fast response on each implemented feature, however, to comprehen-
sively validate the provided results of the tool, it should be tested
In case of double, the tool rightfully can suggest to reorder
on open source projects.
members, but it is not that simple, because it affects the other
The selected projects should cover as many as possible C++
instantiations as well.
features which affect the size calculation. The following four open
Take a look at the next example, where the layout depends on
source projects were selected:
template parameters only and no fixed members are written.
• Avro data serialization format and framework [2], loc 20740
template<class T1, class T2, class T3>
total
struct Dummy3 {
• Flatbuffers data serialization format and framework [11],
T1 first;
loc 101726 total
T2 second;
• RapidJSON header-only library for JSON parsing and gener-
T3 third;
ating [23], loc 39353 total
};
• Thrift RPC data transport and serialization framework [3],
By instantiating Dummy3 class with Dummy3<double, int, int>, loc 129932 total
the layout is good, no padding included. However, using Dummy3<int, The showed lines of code metrics are calculated for the C++
double, int> results in 8 bytes padding, which can be solved by source files in each project, so the calculation excludes everything
reordering template parameters which are literally the members. else from the results. These codebases include many C++ program-
For the record, it relates std::tuple usage, which can result the ming techniques from inheritance to templates, hence the feature
same issue. testing is comprehensive. For testing purposes, the previously pre-
Both cases are possible and valid, but hard to handle both of sented warning is extended with additional logging for each seen
them properly. One of them gives false positives, since members class or struct. This debug record contains information which is
cannot be reordered in just one instantiation easily. In case the tool useful for calculating statistics, like class name, size, alignment and
does not report them, the other one give, false negatives. It can so on.
be minimized by checking, if all members are templates, then the
reordering can be performed by changing template parameters. {
"record": "Dummy",
5.5.3 C++ Standard Template Library. C++ Standard Template Li- "align": 8,
brary (STL) provides useful data structures (e.g. std::list) and "size": 40,
algorithms (e.g. std::count_if) for the efficient development [4]. "fields": 1,
STL takes advantage of the template construct, thus it works to- "stl": true,
gether with user-defined types tightly [21]. However, STL is a stan- "template": false,
dardized library but many different implementations are available "optimized": 40,
[19]. The STL containers use heap memory for storing the elements "filename": "/path/to/tests.cpp"
which is essential regarding the unnecessary memory consumption. }
Two drawbacks can be mentioned. Usage of STL may result in false The records are formatted in json in order to generate statistics
positives because different library implementations are available. later automatically. Although, it does not mean that the results
The previously mentioned problem of template instantiation is also were not checked. Every uniquely printed debug json record was
involved, for instance, in case of nodes of the std::list. validated manually, by checking the class definition, its members,
base classes and so on. All of the calculated sizes were validated,
6 VALIDATION also the records were checked whether they can be reordered or
Various tests were necessary to validate the implemented logic and not. It took a lot of time, although it was the only way to validate
its usability on projects. The tests should cover almost all of the false positive and false negative results as well.
important cases which occurs in everyday code. It is important to mention that classes are judged with the same
The validation phase had two stages. The first of them included weight. This means that the test classes or class usage frequency or
handcrafted, unit test like testcases, where every general and corner memory critical parts were not differentiated, because this approach
cases were checked. It is useful for the development phase to get requires deep knowledge on each project.
Memory Consumption of Objects in C++ ICOOOLPS’22, June 07, 2022, Berlin, Germany
Columns mean the original and the modified source code state. Rows report Columns mean the number of optimized classes, how many of them were
the number of analyzed classes, classes which use STL and template classes. used with STL container or allocted directly on the heap. The rows denote
the analyzed projects.
original optimized
classes 368 7 optimized STL container direct heap
classes with STL 52 4 allocation allocation
template 215 0 Avro 4 0 2
Flatbuffers 7 0 1
RapidJSON 3 1 2
Table 3: Analyzed classes on RapidJSON project Thrift 13 0 3
Columns denote the original and the modified source code state. Rows report Table 6: Optimized layout properties in bytes
the count of analyzed classes, classes which use STL and template classes.
The columns denote the minimum, maximum, average and median class size
original optimized differences. The rows report the analyed projects.
classes 98 3
classes with STL 11 1 min max avg median
template 68 2 Avro 8 8 8.0 8
Flatbuffers 8 48 13.714 8
RapidJSON 8 16 13.34 16
Table 4: Analyzed classes on Thrift project
Thrift 8 16 9.23 8
Columns mean the original and the modified source code state. Rows report
the count of analyzed classes, classes which use STL and template classes. Table 7: Class details in Avro
original optimized This table shows the size of the classes in bytes and in the number of mem-
classes 54 13 bers.
classes with STL 45 13
template 1 0 min max avg median
Class size 4 2840 76.022 24
Member count 0 6 0.423 0
The evaluation results were divided into multiple groups. The
Table 1, Table 2, Table 3 and Table 4 show information about classes. Table 8: Class details in Flatbuffers
The first row called classes is the number of classes which were
analyzed, the second row called classes with STL is the number This table shows the size of the classes in bytes and in the number of mem-
of classes which includes any kind STL related data structure and bers.
the third row shows that how many of the analysed classes is
a template specialization. As it was previously described, each min max avg median
different template instantiation is analyzed independently. The Class size 1 524280 7530.4918 8
first column of the table is the original repository state, and the Member count 0 48 1.089 1
second row shows that how many of them could be optimized.
The numbers show that the projects are actively using data struc-
tures from STL and class templates too. Class layout optimization is the indirect usages, where the class is used within a class which is
possible in every project for some classes. Also, the result includes allocated on the heap, may increase those numbers.
template classes too. Important note, there were a few false posi- By going more into the details, the Table 6 depicts a summary on
tive results for template specializations, which not included in the the optimized classes. The columns mean the minimum, maximum,
presented tables. The tool was right by reordering members really average and median bytes of the original and optimized size differ-
frees up some memory, however, it cannot be done only for the ences. For example, on the Thrift project, the minimum of bytes
given specialization. The both sides of this problem were described gain is 8 bytes, the largest gain is 16 bytes, the average among all
in the previous section. classes was 9.23 bytes and the median was 8 bytes. As it can be
The Table 5 shows a deeper dive in optimized layout classes. It seen, on average, 8-16 bytes can be optimized out by reordering
depicts how many classes were directly allocated on heap by using members, but in special cases, this values may go up.
raw or smart pointers and how many of them were used within a The another point in the analysis behind the bytes profit and
STL container which directly results in heap allocation. As it can be optimizations, how classes look like in size and members. The Table
seen a few of them were, which is memory waste directly. Therefore 7, Table 8, Table 9 and Table 10 show numbers about classes. The
ICOOOLPS’22, June 07, 2022, Berlin, Germany B. Babati and N. Pataki