You are on page 1of 13

 

gperftools 
 

Group #12 - SE Laboratory 


IIEST Shibpur 

Shashwat Srivastava (510517083; Enrolment: B05-510717045) 

Sanat Kumar (510517054) 

Priyanshu Gangwar (510517071) 

Spandan Dutta (510517055) 

Overview
 
gperftools​ is a collection of tools for ​CPU profiling​, h
​ eap profiling​, and ​heap checking​, 
along with a faster high-performance multi-threaded malloc implementation (​thread 
caching malloc​). 

These tools can be used to detect/locate memory leaks in C/C++ programs, figure out 
what the program heap is at any given time, find places that do a lot of allocation, 
analyze CPU profile, etc. Supported profiling output modes are both textual and 
graphical (Directed Graph). Details of usage of these tools are described later in this 
documentation. 

 
2  

Downloads
 
 

Compatibility 

The following systems have been officially checked for compatibility 

● FreeBSD (x86, x86_64) 


● Linux CentOS (x86, x86_64) 
● Linux Debian (PPC, x86) 
● Linux Fedora (x86, x86_64) 
● Linux RedHat, Slackware (x86, x86_64) 
● Linux Ubuntu (x86, x86_64) 
● Mac OS 
● Windows 
● Solaris (x86_64) 

Installation 

The compressed package can be downloaded from ​the official Downloads page on 
Github​ as a .​ zip​ ​or ​.tar.gz​ file and extracted post completion of download. 
 
Installation from terminal on ​linux​ can be done by executing the following commands 
(on ​Debian​). 

sudo apt-get update 


sudo apt-get install google-perftools 
sudo apt-get install gperf libgoogle-perftools-dev 

 
3  

NOTE: ​For a 64-bit system, it is recommended to install (if not already present) t​ he 
latest version of libunwind​ before trying to configure or install g
​ perftools. F
​ ollowing 
command can be used to directly install ​libunwind​ from the terminal. 

sudo apt-get install libunwind-dev 

CPU Profiling
 
A 3-step process is involved, from compiling the source file with proper options to 
analyzing the profile. Considering a C source file named ​test.c​ whose CPU profile is to 
be dumped to ​/tmp/test.prof​, the following commands describe the steps. Read the 
official documentation ​here​. 
 

gcc test.c -o test -Wl,--no-as-needed -lprofiler -Wl,--as-needed 


CPUPROFILE=/tmp/test.prof ./test 
google-pprof ./test /tmp/test.prof 

Adding ​-lprofiler ​option to link time step installs CPU profiler into our executable. In the 
second step, the filename for dumping profile is mentioned followed by our binary. 

Third step mentions our binary followed by the filename of the dumped profile. 
google-pprof ​opens in interactive mode. 

For text data, enter ​text​, and for graphical data, enter ​gv​. For a detailed list of 
commands, use ​help​. Enter q
​ uit t​ o exit. 

NOTE: 

 
4  

1. Re-compilation is not necessary for dumping profiles over and over. 


2. The official documentation mentions the usage of ​pprof ​in the third step, which is 
no longer supported.  
3. The ​execution time must be substantial​ for a good enough sampling. 

Heap Profiling
 
Heap profiling helps in figuring out what the program heap is at any given time and 
finding places that do a lot of allocation. Similar to CPU profiling, a 3-step process is 
involved, from compiling the source file with proper options to analyzing the profile. 
Considering a C source file named ​test.c​ whose heap profile is to be dumped to 
/tmp/test.hprof​, the following commands describe the steps. Read the official 
documentation ​here​. 
 

gcc test.c -o test -Wl,--no-as-needed -ltcmalloc -Wl,--as-needed 


HEAPPROFILE=/tmp/test.hprof ./test 
google-pprof ./test /tmp/test.hprof.0001.heap 

Adding ​-ltcmalloc o
​ ption to link time step installs heap profiler into our executable. In 
the second step, the filename for dumping profile is mentioned followed by our binary. 
Note that after the second step, profiles will be periodically dumped with the filenames 
as: ​/tmp/test.hprof.0001.heap​ /​ tmp/test.hprof.0002.heap​ ​. . .  
Any of these can be analyzed in the third step. 
 
Third step mentions our binary followed by the filename of the dumped profile. In the 
above example, ​0001.heap i​ s analyzed. ​google-pprof o
​ pens in interactive mode. 

 
5  

For text data, enter ​text​, and for graphical data, enter ​gv​. For a detailed list of 
commands, use ​help​. Enter q
​ uit t​ o exit. 

NOTE:  

1. Here as well, re-compilation is not necessary for dumping profiles over and over. 
2. A minimum allocation of ​100 MB​ is necessary for generating heap profiles. 

Heap Checking
 
It is useful for detecting/checking memory leaks. The first step is to install the heap 
checker into the executable (similar to the above two cases). Considering a C source file 
named ​test.c​, the following commands describe the steps. Read the official 
documentation ​here​. 
 

gcc test.c -o test -Wl,--no-as-needed -ltcmalloc -Wl,--as-needed 


HEAPCHECK=normal ./test 

Adding ​-ltcmalloc o
​ ption to link time step installs heap checker into our executable. 
Note that no dumping of any profile is done in this case. In the second step, the mode of 
heap checking is mentioned followed by the executable. 

The supported modes in the order of ​increasing strictness of memory leak checking​ are: 
minimal 
normal 
strict 
draconian 

 
6  

The recommended and most often used mode is ​normal​. 

NOTE:  

1. Heap checker ​records a stack trace for each allocation​, which increases the 
memory usage and slows down the program. 
2. It ​internally uses heap profiler​ and hence, both heap checker and heap profiler 
cannot be simultaneously run. 

Thread-Caching Malloc
 
This is a faster implementation of the default dynamic memory allocation and 
deallocation in C/C++ using malloc, new, free, etc. Also, in case of a multi-threaded 
program, a thread-local cache is assigned to each thread from which smaller allocations 
are satisfied. 

Objects are moved from central data structures to local caches as and when necessary. 

In order to use this in any C/C++ code, TCMalloc needs to be linked into that application 
via ​-ltcmalloc​ flag. As seen above, TCMalloc includes the heap checker and heap 
profiler as well. 

 
7  

Testing
 
For getting to know the working of these tools, small tests were done by us on some 
small C/C++ code bases. 

[For multi-threaded programs, these utilities (specifically, ​TCMalloc​) serve a useful 


purpose by reducing lock contentions, with virtually zero contention for small objects. 
Also, for big size applications, speed of execution of a m
​ alloc/free p
​ air​ i​ s important and 
that’s where ​TCMalloc​ could come in handy. Some advanced tests, which have not been 
covered here, can be performed to test these functionalities.] 

A simple program was written to find the transpose of a matrix. Allocation was done for 
two (input and resultant) square matrices. The size of the matrix needs to be at least 
3621 X 3621 ​in order to generate the profiling results because the minimum required 
size is ​100 MB (= 104,857,60 B)​ and for two matrices of mentioned size containing 
integers ​(4 bytes)​, the size is 3
​ 621 X 3621 X 4 X 2 bytes = 104,893,128 B​. 

A function for calculating transpose is called from within m


​ ain().​ Allocation is done for 
dynamic memory before this call and the memories are freed in the end. 

For CPU Profiling, the text and GV output are given below. 

 
8  

 
9  

There are two ways in which time spent inside a function (using knowledge of number 
of profiling samples generated) is calculated. One is the time spent e
​ xclusively inside 
that particular function​ ​(excluding the time spent in the functions called from it)​. The 
other is to include the time spent in the functions called from within as well. 

From the text data, it is seen that 1


​ 6.8%​ of time is spent in ​main()​ as calculated using 
the first approach, while it is ​100%​ using the second approach (which is obvious since 
the control enters through main and exits from it as well). 

 
10  

The directed graph shows not only the different number of profiling samples per 
function but also the direction of function calls. 

For heap profiling, any of the periodically dumped profiles can be analysed in t​ ext​ or ​gv 
mode. For this test program, eight profiles are generated as mentioned below. The last 
profile has ​zero bytes in use​ as deallocation is done in the end before the control leaves 
main()​.

Since almost all of the allocation is done in ​main()​, the following outputs are obtained 
on analysing the first profile (0001). 

 
11  

For testing heap checking, a call to a function is made which just allocates some 
memory for an integer and then ​returns without freeing it​. This is caught by the ​heap 
checker​ and the following information is displayed. 

 
12  

Leak check _main_ detected leaks of 4 bytes in 1 objects 


The 1 largest leaks: 
*** WARNING: Cannot convert addresses to symbols in output below. 
*** Reason: Cannot find 'pprof' (is PPROF_PATH set correctly?) 
*** If you cannot fix this, try running pprof directly. 
Leak of 4 bytes in 1 objects allocated from: 
@ 558987a7e8b8  
@ 558987a7eac0  
@ 7f81fb847b97  
@ 558987a7e73a  
 
If the preceding stack traces are not enough to find the leaks, try running THIS shell 
command: 
 
pprof ./gperftools_test "/tmp/gperftools_test.23974._main_-end.heap" 
--inuse_objects --lines --heapcheck --edgefraction=1e-10 --nodefraction=1e-10 
--gv 

Note that along with detection of leak a command is provided to get the stack trace to 
further investigate the origin of leak(s). 

Criticisms
 
Although ​gperftools​ provides great tools for debugging and profiling, the following 
points are worth noting. 

 
13  

● The official documentation is not descriptive enough for hassle-free 


installation and setup for first-time users. Some dependencies are not mentioned 
correctly and/or unavailable presently. For instance, as mentioned earlier, the 
pprof​ utility does not actually work and g
​ oogle-pprof h
​ as to be used instead. 
 
● The ​directed graph​ for depicting profiling data is not user-friendly in terms of 
scrolling and zooming in/out (as seen in ​Ubuntu​ systems). 
 
● The profiling data requires some minimum CPU usage/memory allocation. This 
might be an issue for some programs which do not meet the criteria.   

You might also like