Project-1 Basic Architecture Design

PROJECT-1
BASIC ARCHITECTURE DESIGN

1. x86 Architecture
Basic feature of x86 architecture
1. It is based on modified Harvard architecture. It has a segmented memory design
where there are 16 bit general purpose register called segment register.
1. Code segment register(CS):- It points to the segment of the memory where
instructions are stored.
2. Data segment register(DS):- It points to the segment where data is stored.
3. Extra segment register(ES):- It points to another segment which contain data.
4. Stack segment register(SS):- It points to the top of stack segment of the memory
where stack data are stored.
x86 architecture has evolved over time so the way it address the memory has also
change over time, after 80286 processor, the segment register hold an index in the
descriptor table(i.e. a data structure that contain the information about the memory
area like the base address, size, privileges etc.). So basically in order to access the
memory we use the index in the segment register to get the information regarding
the memory segment using descriptor tables. In present x86-64 architecture, the
modern system doesn’t use memory segmentation technique rather they use paging
technique (a page can be considered as a block of data loaded from secondary
memory to main memory now in paging technique if some pages are not found on
the main memory than the OS determine the location of data on the secondary
memory than find an empty location on main memory, if such no location is empty
than it move an already loaded page from the main memory to secondary memory if
it is dynamically allocated and then deleted the page from main memory and create
free location for new page and then load the required page on the main memory
and update the page table which is a data structure that stores the mapping
between virtual address and physical address.)
2. It follows little endian convention.
3. It has variable instruction length, which are based on CISC design
4. x86 architecture support FMA (fused multiply add) instructions which further
increase processing power of PC. The FMA instruction calculate multiply and add
operation in a expression at the same time. For example d = c*a +b then a FMA
implementation will perform c multiplied by a and add b to it at the same time and
then rounded to N bits and result is saved where as in an unfused multiply add
operation first a is multiplied to c then the result is saved in a register after rounding
it to N bits and then b is added to the saved result and then result is rounded to N
bits and then it is saved to the registers. So we can clearly see if an architecture has
an good implementation of FMA then is speed increase as less number of register
transaction is required. Also the precision of calculation increases as in unfused
multiply add two rounding’s are done whereas in FMA only one rounding is done.
5. In recent year, Intel has released Xeon processor which supports superscalar
architecture where two or more independent pipelines can be executed at the same
time thus supporting MIMD(Multiple Instruction Multiple Data) instructions
6. It has instructions which are based on SIMD principle(i.e. single instruction multiple
data).So SIMD is a kind of parallelism where we perform the same operation on
multiple data at the same time which resembles vector processor in functionality. So
basically this kind of parallelism can be achieved by using a register to hold a set of
data points rather than a single data. In x86 we have extension like MMX, SSE ,AVX
which add registers which can hold multiple data points.
1. MMX extension
It was the first extension added to x86 architecture which supports SIMD
instructions. It introduced 8 registers(80 bit wide) MM0 to MM7 which are
aliases for floating point unit (FPU) registers. So when MMX register are used for
SIMD instruction then it uses only the last 64 bit and the higher bits are set to 1.
So an MMX register can be used to store eight 8 bit integers, four 16 bit integers
or two 32 bit integers simultaneously and perform arithmetic operation on them
at the same time. For example
2. SSE extension
This extension overcome the limitations of MMX extension. It introduced SIMD
instructions for floating point number and introduced separate sixteen 128 bit
register (XMM0-XMM15) . This register can store 2 double precision floating
point number(64 bit wide) or it can be used to store 4 single precision floating
point number(32 bit wide) simultaneously. This register can be used to store
multiple integer at the same time as well. This register are different then that
used for FPU hence it is independent of FPU operations.
3. AVX extensions
This extension further increase the ability of x86 architecture to perform SIMD
instruction by introducing sixteen 256 bit registers(YMM0-YMM15).
Furthermore, thirty two 512 bits registers(ZMM0-ZMM31) were introduced to
increase the computation power of x86 architecture.
How do x86 Architecture support ML/AI

Almost every ML/AI algorithm involve vectors and operations like matrix multiplication, dot
product, convolution where we perform multiple and add operation over and over again on an
dataset containing a large amount of data points. So x86 architecture complement ML and AI
by providing feature like
1. Superscalar pipelines, multiple cores which support multithreading
2. FMA operation
3. SIMD instructions (because in ML and AI algorithm we repeatedly perform same
operations on multiple data type)
Over the year Intel has developed Xeon processors which complement ML and AI algorithm
Xeon processors are based on x86 architecture. The feature of Xeon processor are described
below
1. It has 72 cores which support Superscalar pipeline with each core containing multiple
vector processing unit.
2. It has a larger L1 and L2 cache with 512 AVX extension registers which supports FMA
operations and SIMD instructions
3. It has a memory with large bandwidth which reduces the time taken for moving data
from memory to the processor thus speeding up training phase of ML/AI model
4. Scaling from single nodes to a large number of nodes is very important because many
times ML and AI algorithm involves large neural network so they need more than one
computational unit or node. So for connecting nodes intel has provided OPA(Omni Path
Architecture) fabrics which is energy efficient and faster way to provide communication
between nodes. So basically Omni Path Architecture is an architecture that aims at low
latency, low energy consumption and high throughput. So its basic feature are
1. In OPA the fabric manger it finds a least congested path by interacting with the IC
chips of links thus dynamically measuring congestion at each link
2. In OPA data can be transferred through many routes between the same nodes
depending upon congestion
3. In OPA, transfer of data is based on priority first the data with high priority is
transferred.
4. In OPA variable length chunk of data is divided into fixed size containers which in
turn are packed into fixed size links.
5. OPA is tolerable to transient bit error
6. OPA can be easily programmed
7. One of the salient feature of OPA is that it continues the process of transferring the
data if any one of the lane fails, saving the need for restart or going back to a
previous checkpoint which are not present in many communicating fabrics.
So this are some of the modification provided by intel to x86 architecture to support ML/AI
algorithm.

Project-1 Basic Architecture Design

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project-1 Basic Architecture Design

Uploaded by

Copyright:

Available Formats

PROJECT-1

BASIC ARCHITECTURE DESIGN

How do x86 Architecture support ML/AI

You might also like