Professional Documents
Culture Documents
Alan L. Cox
alc@rice.edu
Cox Linking 2
Example Program (2 .c files)
/* main.c */ /* swap.c */
void swap(void); extern int buf[];
int buf[2] = {1, 2};
int *bufp0 = &buf[0];
int *bufp1;
int main(void)
{ void swap(void)
swap(); {
int temp;
return (0);
} bufp1 = &buf[1];
temp = *bufp0;
*bufp0 = *bufp1;
*bufp1 = temp;
}
Cox Linking 3
An Analogy for Linking
Cox Linking 4
Linking
Cox Linking 5
Compilation
Assembler: .s assembly
Compiler: code
.c C source toto
code .o.s
relocatable object code
assembly code
UNIX% cc -v -O -g -o p main.c swap.c
…
cc1 -quiet -v main.c -quiet -dumpbase main.c -mtune=generic
-auxbase main -g -O -version -o /tmp/cchnheja.s
as -V -Qy -o /tmp/ccmNFRZd.o /tmp/cchnheja.s
…
cc1 -quiet -v swap.c -quiet -dumpbase swap.c -mtune=generic
-auxbase swap -g -O -version -o /tmp/cchnheja.s
as -V -Qy -o /tmp/ccx8FECg.o /tmp/ccheheja.s
…
collect2 --eh-frame-hdr –m elf_x86_64 --hash-style=gnu -dynamic-
linker /lib64/ld-linux-x86-64.so.2 -o p crt1.o crti.o
crtbegin.o –L<..snip..> /tmp/ccmNFRZd.o /tmp/ccx8FECg.o –lgcc
--as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed
-lgcc_s --no-as-needed crtend.o crtn.o
Linker: .o to executable
Cox Linking 6
Compilation UNIX% cc -O –g -o p main.c swap.c
cc1 cc1
as as
Relocatable
main.o swap.o object code
p Executable
Cox Linking 7
ELF (Executable Linkable Format)
0
Order & existence of ELF header
segments is arbitrary, Program header table
except ELF header must
.text
be present and first
.rodata
.data
.bss
.symtab
.rel.text
.rel.data
.debug
…
Section header table
Cox Linking 8
ELF Header
0
Basic description of file ELF header
contents: Program header table
File format identifier
.text
Architecture
.rodata
Endianness
.data
Alignment requirement
for other sections .bss
Location of other .symtab
sections
.rel.text
Code’s starting address
.rel.data
…
.debug
…
Section header table
Cox Linking 9
Program and Section Headers
0
Info about other sections ELF header
necessary for loading into Program header table
memory for execution
.text
Required for executables
& libraries .rodata
.data
.bss
.symtab
.rel.text
.rel.data
Info about other sections .debug
necessary for linking …
Required for
Section header table
relocatables
Cox Linking 10
Text Section
0
Machine code (instructions) ELF header
read-only Program header table
.text
.rodata
.data
.bss
.symtab
.rel.text
.rel.data
.debug
…
Section header table
Cox Linking 11
Data Sections
0
Static data ELF header
initialized, read-only Program header table
initialized, read/write
.text
uninitialized, read/write
(BSS = “Block Started by .rodata
Symbol” pseudo-op for
.data
IBM 704)
Initialized .bss
Initial values in ELF file .symtab
Uninitialized .rel.text
Only total size in ELF file
.rel.data
Writable distinction enforced
.debug
at run-time
Why? Protection; sharing …
How? Virtual memory Section header table
Cox Linking 12
Symbol Table
0
Describes where global ELF header
variables and functions Program header table
are defined
.text
Present in all relocatable
ELF files .rodata
.data
.bss
/* main.c */
.symtab
void swap(void);
.rel.text
int buf[2] = {1, 2};
.rel.data
int main(void) .debug
{
…
swap();
return (0); Section header table
}
Cox Linking 13
Relocation Information
0
Describes where and how ELF header
symbols are used Program header table
A list of locations in
.text
the .text section that
will need to be modified .rodata
when the linker .data
combines this object file
.bss
with others
Relocation information .symtab
for any global variables .rel.text
that are referenced or
.rel.data
defined by the module
Allows object files to be .debug
easily relocated …
Section header table
Cox Linking 14
Debug Section
0
Relates source code to the ELF header
object code within the ELF Program header table
file
.text
.rodata
.data
.bss
.symtab
.rel.text
.rel.data
.debug
…
Section header table
Cox Linking 15
Other Sections
0
Other kinds of sections ELF header
also supported, including: Program header table
Other debugging info
.text
Version control info
.rodata
Dynamic linking info
.data
C++ initializing &
finalizing code .bss
.symtab
.rel.text
.rel.data
.debug
…
Section header table
Cox Linking 16
Linker Symbol Classification
Global symbols
Symbols defined by module m that can be referenced by
other modules
C: non-static functions & global variables
External symbols
Symbols referenced by module m but defined by some
other module
C: extern functions & variables
Local symbols
Symbols that are defined and referenced exclusively by
module m
C: static functions & variables
Cox Linking 17
Linker Symbols
Reference to external
symbol buf
Linker knows nothing
about local variables
Cox Linking 18
Linker Symbols
/* main.c */
What’s missing?
void swap(void);
swap – where is it?
int buf[2] = {1, 2};
int main(void)
{ buf isis
main
swap is
anareferenced
8-byte
19-byteobject
function
in this
located
file,
located
but
at offset
at
is 0
swap(); offset
undefined
of section
0 of 3section
(UND)
(.data)1 (.text)
return (0);
}
use readelf –S to see sections
UNIX% cc -O -c main.c
UNIX% readelf -s main.o
void swap(void)
{ bufp0
bufp1
swap
buf isis
is
referenced
an
a 38-byte
8-byte inobject
uninitialized
function
this file,
located
located
but
(COMMON)
is
at at
offset
offset
int temp; 0
object
0
undefined
of
ofsection
section
with (UND)
an
31(.data)
8-byte
(.text) alignment requirement
bufp1 = &buf[1];
temp = *bufp0;
*bufp0 = *bufp1;
*bufp1 = temp;
}
Symbol Resolution
Determine where symbols are defined and what
size data/code they refer to
Relocation
Combine modules, relocate code/data, and fix
symbol references based on new locations
Relocatable
main.o swap.o
object code
ld (collect2)
p Executable
Cox Linking 21
Problem: Undefined Symbols
forgot to type swap.c
UNIX% cc -O -o p main.c
/tmp/cccpTy0d.o: In function `main’:
main.c:(.text+0x5): undefined reference to `swap’
collect2: ld returned 1 exit status
UNIX%
Cox Linking 22
Problem: Multiply Defined Symbols
Cox Linking 23
Linking: Example
int x = 3; extern int x;
int y = 4; static int y = 6;
int z; int z;
?
?
Note: Linking uses object files
Examples use source-level for convenience
Cox Linking 24
Linking: Example
int x = 3; extern int x;
int y = 4; static int y = 6;
int z; int z;
int x = 3;
Cox Linking 25
Linking: Example
int x = 3; extern int x;
int y = 4; static int y = 6;
int z; int z;
int x = 3;
int y = 4;
int y’ = 6;
Renaming is a
convenient source-level
way to understand this
int foo(int a) {…}
int bar(int b) {…}
int bar’(int b) {…}
Cox Linking 26
Linking: Example
int x = 3; extern int x;
int y = 4; static int y = 6;
int z; int z;
int x = 3;
int y = 4;
int y’ = 6; C allows you to omit
int z; “extern” in some
cases – Don’t!
int foo(int a) {…}
int bar(int b) {…}
int bar’(int b) {…}
Cox Linking 27
Strong & Weak Symbols
p1.c p2.c
strong int foo=5; int foo; weak
Cox Linking 28
Strong & Weak Symbols
Cox Linking 29
Linker Puzzles: What Happens?
int x; p1() {}
Link time error: two strong symbols p1
p1() {}
int x; double x;
int y; p2() {} Writes to x in p2 might overwrite y!
p1() {} Evil!
Cox Linking 31
Linking Steps
Symbol Resolution
Determine where symbols are defined and what
size data/code they refer to
Relocation
Combine modules, relocate code/data, and fix
symbol references based on new locations
Relocatable
main.o swap.o
object code
ld (collect2)
p Executable
Cox Linking 32
.symtab & Pseudo-Instructions in main.s
UNIX% cc -O -c main.c
UNIX% readelf -s main.o
.symtab
where’s
swap?
Cox Linking 35
Relocation
Cox Linking 36
Relocation: Merging Files
main.o
.text
.data
… p
.symtab
.text
.data
swap.o
…
.text .symtab
.data
…
.symtab
Cox Linking 37
Linking: Relocation
/* main.c */ UNIX% objdump -r -d main.o
void swap(void);
main.o: file format elf64-x86-64
int buf[2] = {1, 2};
Disassembly of section .text:
int main(void)
{ 0000000000000000 <main>:
swap(); 0: 48 83 ec 08 sub $0x8,%rsp
4: e8 00 00 00 00 callq 9 <main+0x9>
return (0);
5: R_X86_64_PC32
} swap+0xfffffffffffffffc
9: b8 00 00 00 00 mov $0x0,%eax
can also use readelf e: 48 83 c4 08 add $0x8,%rsp
–r to see relocation 12: c3 retq
information
Offset
Type ofinto
symbol
text (PC
section
relative
(relocation
32-bit signed)
information
Symbol name
is stored in a different section of the file)
Cox Linking 38
Linking: Relocation
/* swap.c */ UNIX% objdump -r -D swap.o
extern int buf[];
int *bufp0 = &buf[0]; swap.o: file format elf64-x86-64
int *bufp1;
Disassembly of section .text:
void swap()
{ 0000000000000000 <swap>:
int temp; 0: 48 c7 05 00 00 00 00 movq $0x0,0(%rip)
7: 00 00 00 00
bufp1 = &buf[1]; 3: R_X86_64_PC32
temp = *bufp0; bufp1+0xfffffffffffffff8
*bufp0 = *bufp1; 7: R_X86_64_32S buf+0x4
*bufp1 = temp; <..snip..>
} Disassembly of section .data:
0000000000000000 <bufp0>:
...
0: R_X86_64_64 buf
0000000000400448 <main>:
400448: 48 83 ec 08 sub $0x8,%rsp
40044c: e8 0b 00 00 00 callq 40045c <swap>
400451: b8 00 00 00 00 mov $0x0,%eax
400456: 48 83 c4 08 add $0x8,%rsp
40045a: c3 retq
40045b: 90 nop
000000000040045c <swap>:
40045c: 48 c7 05 01 04 20 00 movq $0x600848,2098177(%rip)
Cox Linking 40
After Relocation
0000000000000000 <swap>:
0: 48 c7 05 00 00 00 00 movq $0x0,0(%rip)
7: 00 00 00 00
3: R_X86_64_PC32 bufp1+0xfffffffffffffff8
7: R_X86_64_32S buf+0x4
<..snip..>
0000000000000000 <bufp0>:
...
0: R_X86_64_64 buf
000000000040045c <swap>:
40045c: 48 c7 05 01 04 20 00 movq $0x600848,2098177(%rip)
400463: 48 08 60 00 # 600868 <bufp1>
<..snip..>
0000000000600850 <bufp0>:
600850: 44 08 60 00 00 00 00 00
Cox Linking 41
Libraries
How should functions commonly used by programmers
be provided?
Math, I/O, memory management, string manipulation,
etc.
Option 1: Put all functions in a single source file
• Programmers link big object file into their programs
• Space and time inefficient
Option 2: Put each function in a separate source file
• Programmers explicitly link appropriate object files into their
programs
• More efficient, but burdensome on the programmer
Solution: static libraries (.a archive files)
Multiple relocatable files + index single archive file
Only links the subset of relocatable files from the library
that are used in the program
Example: cc –o fpmath main.c float.c -lm
Cox Linking 42
Two Common Libraries
libc.a (the C standard library)
4 MB archive of 1395 object files
I/O, memory allocation, signal handling, string handling,
data and time, random numbers, integer math
Usually automatically linked
libm.a (the C math library)
1.3 MB archive of 401 object files
floating point math (sin, cos, tan, log, exp, sqrt, …)
Use “-lm” to link with your program
/* addvec.c */ /* multvec.c */
#include “vector.h” #include “vector.h”
void addvec(int *x, int *y, void multvec(int *x, int *y,
int *z, int n) int *z, int n)
{ {
int i; int i;
Cox Linking 44
Using a library
/* main.c */
#include <stdio.h>
#include “vector.h”
main.o libvector.a libc.a
int x[2] = {1, 2};
int y[2] = {3, 4}; addvec.o printf.o
int z[2];
ld
int main(void)
{
addvec(x, y, z, 2);
program
printf(“z = [%d %d]\n”, z[0], z[1]);
return (0);
}
UNIX% cc –O –c main.c
UNIX% cc –static –o program main.o ./libvector.a
Cox Linking 45
How to Link: Basic Algorithm
Keep a list of the current unresolved references.
For each object file (.o and .a) in command-line order
Try to resolve each unresolved reference in list to objects
defined in current file
Try to resolve each unresolved reference in current file
to objects defined in previous files
Concatenate like sections (.text with .text, etc.)
If list empty, output executable file, else error
Cox Linking 46
Why UNIX% cc libvector.a main.o
Doesn’t Work
Linker keeps list of currently unresolved
symbols and searches an encountered library
for them
Cox Linking 47
Dynamic Libraries
Static Dynamic
Cox Linking 48
Static & Dynamic Libraries
Static Dynamic
Library code added to Library code not added
executable file to executable file
Larger executables Smaller executables
Must recompile to use Uses newest (or
newer libraries smallest, fastest, …)
library without
recompiling
Executable is self-
Depends on libraries at
run-time
contained
Some time to load
Some time to load
libraries at run-time
libraries at compile-time
Library code shared only
Library code shared
among all uses of library
among copies of same
program
Cox Linking 49
Static & Dynamic Libraries
Static Dynamic
Creation Creation
ar rcs libfoo.a bar.o baz.o cc –shared –Wl,-soname,libfoo.so
ranlib libfoo.a -o libfoo.so bar.o baz.o
Use Use
cc –o zap zap.o -lfoo cc –o zap zap.o -lfoo
Cox Linking 50
Loading
Running a program
unix% ./program
Shell does not recognize “program” as a shell command,
so assumes it is an executable
Invokes the loader to load the executable into memory
(any unix program can invoke the loader with the execve
function – more later)
Cox Linking 51
Creating the Memory Image (sort of…)
0x7FFFFFFFFFFF
Cox Linking 52
Starting the Program
Cox Linking 53
Position Independent Code
Static libraries compile with unresolved global & local
addresses
Library code & data concatenated & addresses resolved
when linking
Cox Linking 54
Position Independent Code
By default (in C), dynamic libraries compile with
resolved global & local addresses
E.g., libfoo.so starts at 0x400000 in every application
using it
Cox Linking 55
Position Independent Code
Can compile dynamic libraries with unresolved global &
local addresses
cc –shared –fPIC …
Cox Linking 56
Library Interpositioning
Linking with non-standard libraries that use
standard library symbols
“Intercept” calls to library functions
Some applications:
Security
• Confinement (sandboxing)
• Behind the scenes encryption
– Automatically encrypt otherwise unencrypted network
connections
Monitoring & Profiling
• Count number of calls to functions
• Characterize call sites and arguments to functions
• malloc tracing
– Detecting memory leaks
– Generating malloc traces
Cox Linking 57
Dynamic Linking at Run-Time
void
dlink(void)
{
Symbols resolved at
void *handle = dlopen(“mylib.so”, RTLD_LAZY); first use, not now
Cox Linking 58
Next Time
Exceptions
Cox Linking 59