Data Alignment
What is Data Alignment?
In programming language, a data object (variable) has 2 properties; its value and the storage location (address). Data alignment means that the address of a data can be evenly divisible by 1, 2, 4, or 8. In other words, data object can have 1-byte, 2-byte, 4-byte, 8-byte alignment or any power of 2. For instance, if the address of a data is 12FEECh, then it is 4-byte alignment because the address can be evenly divisible by 4. (You can divide it by 2 or 1, but 4 is the highest number that is divisible evenly.) CPU does not read from and write to memory one byte at a time. Instead, CPU accesses memory in 2, 4, 8, 16, or 32 byte chunks at a time. The reason for doing this performance - accessing an address on 4-byte or 16-byte boundary is a lot faster than accessing an address on 1-byte boundary. The following diagram illustrates how CPU accesses a 4-byte chuck of data with 4byte memory access granularity. Memory Mapping

Memory mapping from memory to CPU cache If the data is misaligned of 4-byte boundary, CPU has to perform extra work to access the data: load 2 chucks of data, shift out unwanted bytes then combine them together. This process definitely slows down the performance and wastes CPU cycle just to get right data from memory.

Accessing Misaligned data

Misaligned data slows down data access performance

Structure Member Alignment
In 32-bit x86 systems, the alignment is mostly same as its size of data type. Compiler aligns variables on their natural length boundaries. CPU will handle misaligned data properly, so you do not need to align the address explicitly. Data alignment for each type Data Type Alignment (bytes) char 1 short 2 int 4 float 4 double 4 or 8 However, the story is a little different for member data in struct, union or class objects. The struct (or union, class) member variables must be aligned to the highest bytes of the size of any member variables to prevent performance penalties. For example, if you have 1 char variable (1-byte) and 1 int variable (4-byte) in a struct, the compiler will pads 3 bytes between these two variables. Therefore, the total size of this struct variable is 8 bytes, instead of 5 bytes. By doing this, the address of this struct data is divisible evenly by 4. This is called structure member alignment. Of course, the size of struct will be grown as a consequence.
// size = 2 bytes, alignment = 1-byte, address can be divisible by 1 struct S1 { char m1; // 1-byte char m2; // 1-byte };

// size = 4 bytes, struct S2 { char m1; // // short m2; // }; // size = 8 bytes, struct S3 { char m1; // // int m2; // };

alignment = 2-byte, address can be divisible by 2 1-byte padding 1-byte space here 2-byte alignment = 4-byte, address can be divisible by 4 1-byte padding 3-byte space here 4-byte

// size = 16 bytes, alignment = 8-byte, address can be divisible by 8 struct S4 { char m1; // 1-byte // padding 7-byte space here double m2; // 8-byte }; // size = 16 bytes, alignment = 8-byte, address can be divisible by 8 struct S5 { char m1; // 1-byte // padding 3-byte space here int m2; // 4-byte double m2; // 8-byte };

You may use "pack" pragma directive to specify different packing alignment for struct, union or class members.
// 1-byte struct member alignment // size = 9, alignment = 1-byte, no padding for these struct members #pragma pack(push, 1) struct S6 { char m1; // 1-byte double m2; // 8-byte }; #pragma pack(pop)

Be aware of using custom struct member alignment. It may cause serious compatibility issues, for example, linking external library using different packing alignments. It is better use default alignment all the time.

Data Alignment for SSE
SSE (Streaming SIMD Extensions) defines 128-bit (16-byte) packed data types (4 of 32-bit float data) and access to data can be improved if the address of data is aligned by 16-byte; divisible evenly by 16. You can declare a variable with 16-byte aligned in MSVC, using __declspec(align(16)) keyword;

__declspec(align(16)) float array[SIZE]; ... struct __declspec(align(16)) S1 { float v[4]; }

Dynamic array can be allocated using _aligned_malloc() function, and de allocated using _aligned_free().
// allocate 16-byte aligned data float* array = (float*)_aligned_malloc(SIZE*sizeof(float), 16); ... // deallocate memory _aligned_free(array);

Or, you can manually align address like this;
// allocate 15 byte larger array // because in worst case, the data can be misaligned upto 15 bytes. float* array = (float*)malloc(SIZE*sizeof(float)+15); // find the aligned position // and use this pointer to read or write data into array float* alignedArray = (array + 15) & (~0x0F); ... // dellocate memory original "array", NOT alignedArray free(array); array = alignedArray = 0;

Because 16-byte aligned address must be divisible by 16, the least significant digit in hex number should be 0 all the time. That is why logical operators are used to make the first digit zero in hex number. Bitwise AND Operator

And, you may have from 0 to 15 bytes misaligned address. In worst case, you have to move the address 15 bytes forward before bitwise AND operation. Therefore, you need to append 15 bytes extra when allocating memory.

Aligned and Misaligned

© 2005 Song Ho Ahn