You are on page 1of 43

# C6000

Architecture

Outline
CPU Architecture
Instruction Set Overview
Internal Buses & Memory
C6000 Peripherals Overview
Device Family Review

Internal
Memories

CPU

## What Problem Are We Trying To Solve?

(A)

Digital sampling of
an analog signal
code

T =1
fs

N

Y =

an * xn

n = 1

40

Y =

an * xn
n = 1

## Lets write the code for this

algorithm
And develop the architecture
along the way...

## What are the two basic

instructions required
by this algorithm?
Multiply

Multiply
40

Y =

an * xn
n = 1

.?

MPY

a, x, prod

40

Y =

an * xn
n = 1

.M

MPY .M

a, x, prod

40

Y =

an * xn
n = 1

.M
.?

MPY .M

a, x, prod

40

Y =

an * xn
n = 1

Where are
the variables
stored?

.M
.L

MPY .M

a, x, prod

## sum, prod, sum

Register File - A
Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

an * xn
n = 1

.M
.L

MPY .M

a, x, prod

Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

an * xn
n = 1

.M
.L

MPY .M

A0, A1, A3

A4, A3, A4

Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

an * xn
n = 1

.M
.L

MPY .M

A0, A1, A3

A4, A3, A4

## How Do You Create the Loop?

Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.

40

Y =

an * xn
n = 1

.M
.L

MPY

.M

A0, A1, A3

.L

A4, A3, A4

Loop?

A31
32-bits

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter (= 40)
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the value
in the loop counter

Branching (1)
Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

.?

an * xn
n = 1

.M
loop:

.L

MPY

.M

A0, A1, A3

.L

A4, A3, A4

.?

loop

## Branching (.S Unit)

Register File A
A0
a
x
A1
A2
prod
A3
Y
A4

.
.
.
A31
32-bits

40

Y =

.S

an * xn
n = 1

.M
loop:

.L

MPY

.M

A0, A1, A3

.L

A4, A3, A4

.S

loop

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter (= 40)
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the
value in the loop counter

MVK

.S

40, A2

; A2 = 40

## Creating a Loop Counter (2)

Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4

.
.
.
A31
32-bits

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

MPY

.M

A0, A1, A3

.L

A4, A3, A4

.S

loop

n = 1

loop:

.L

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter (= 40)
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the
value in the loop counter

## Decrementing Loop Counter (3)

Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4

.
.
.
A31
32-bits

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

MPY

.M

A0, A1, A3

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

n = 1

loop:

.L

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter (= 40)
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the
value in the loop counter

Conditional Instructions
To minimize branching, all instructions are conditional

[condition]

loop

## Execution based on !zero/non-zero condition

Code Syntax

Execute instruction if :

[cond]

true:

cond 0

[!cond]

false:

cond = 0

## Using Conditional Branch (4)

Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4

.
.
.
A31

40

Y =

.M

MVK

.S

40, A2

MPY

.M

A0, A1, A3

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

n = 1

loop:

.L

[A2] B
32-bits

an * xn

.S

Creating a Loop
1. Add branch instruction (B) and a label
2. Create a loop counter with proper value
3. Add an instruction to decrement the loop counter
4. Make the branch conditional based on the
value in the loop counter

Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4
&a[n]
A5
&x[n]
A6
&Y
A7
..
A31

## How do a and x get loaded?

.S

a, x, Y located in memory

.M

## Create a pointer to values

A5 = &a
A6 = &x
A7 = &Y
LD
*A5, A0
LD
*A6, A1
ST
A4, *A7

.L

32-bits

Memory

a [40]
x [40]
Y

*A5
*A6
*A7

Because the 'C6000 provides byte addressability, the instruction
set supports several types of load/store instructions:

C Data Type

LDB

char

LDH

short

LDW

int

LDDW

## double, long long

Not Supported

C62x

Store instructions
STB

char

STH

short

STW

int

STDW

C62x, C67x

## Use LDH for Short (16x16) MPYs

Because the 'C6000 provides byte addressability, the instruction
set supports several types of load/store instructions:

C Data Type

LDB

char

LDH

short

LDW

int

LDDW

## double, long long

Not Supported

C62x

Store instructions
STB

char

STH

short

STW

int

STDW

C62x, C67x

Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4
&a[n]
A5
&x[n]
A6
&Y
A7
..
A31

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

LDH

.?

*A5, A0

LDH

.?

*A6, A1

MPY

.M

A0, A1, A3

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

.?

A4, *A7

n = 1

loop:

.L
.?

[A2] B
STH

32-bits

Data Memory

Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4
&a[n]
A5
&x[n]
A6
&Y
A7
..
A31

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

LDH

.D

*A5, A0

LDH

.D

*A6, A1

MPY

.M

A0, A1, A3

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

.D

A4, *A7

n = 1

loop:

.L
.D

[A2] B
STH

32-bits

Data Memory

A5
A6
A5
++

a0
a1
a2

a
&x
&

A6
++

.
.

40

Y = an * xn

x0
x1
x2

.
.

## After first loop, A4 contains...

a0 * x0
How do you access a1 and
x1 on the second loop?
LDH .D
*A5++, A0
LDH .D
*A6++, A1

n = 1

loop:

MVK

.S

40, A2

LDH
LDH

.D

*A5,
*A5++,
A0A0

LDH
LDH

.D

*A6, A1A1
*A6++,

MPY

.M

A0, A1, A3

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

.D

A4, *A7

[A2] B
STH

Register File A
A0
a
x
A1
A2 loop count
prod
A3
Y
A4
&a[n]
A5
&x[n]
A6
&Y
A7
..
A31

40

an * xn

.S

Y =

.M

MVK

.S

40, A2

LDH

.D

*A5++, A0

LDH

.D

*A6++, A1

MPY

.M

A0, A1, A3

.L

A4, A3, A4

SUB

.L

A2, 1, A2

.S

loop

.D

A4, *A7

n = 1

loop:

.L
.D

[A2] B
STH

32-bits

Data Memory

Register File A
A0
A1
A2
A3
A4

.
.
.
A31

Register File B
.S1

.S2

.M1

.M2

.L1

.L2

.D1

.D2

32-bits

B0
B1
B2
B3
B4

.
.
.
B31
32-bits

Data Memory

40

Y =

an * xn

n = 1

MVK
loop: LDH
LDH
MPY
SUB
[A2] B
STH

.S1
.D1
.D1
.M1
.L1
.L1
.S1
.D1

40, A2
*A5++, A0
*A6++, A1
A0, A1, A3
A3, A4, A4
A2, 1, A2
loop
A4, *A7

## ; A2 = 40, loop count

; A0 = a(n)
; A1 = x(n)
; A3 = a(n) * x(n)
; Y = Y + A3
; decrement loop count
; if A2 0, branch
; *A7 = Y

## Note: Assume A4 previously cleared.

Outline
CPU Architecture
Instruction Set Overview
Classic C6x Devices (C62x, C67x)
Introducing SIMD (C64x)
Brand New (C64x+, C674x, C66x)

## Internal Buses & Memory

C6000 Peripherals Overview
Device Family Review
Exam 1

Outline
CPU Architecture
Instruction Set Overview
Classic C6x Devices (C62x, C67x)
Introducing SIMD (C64x)
Brand New (C64x+, C674x, C66x)

## Internal Buses & Memory

C6000 Peripherals Overview
Device Family Review
Exam 1

## C62x Instruction Set (by category)

Arithmetic

Logical

ABS
MPY
MPYH
NEG
SMPY
SMPYH
SAT
SSUB
SUB
SUBA
SUBC
SUB2
ZERO

AND
CMPEQ
CMPGT
CMPLT
NOT
OR
SHL
SHR
SSHL
XOR

Bit Mgmt
CLR
EXT
LMBD
NORM
SET

Data Mgmt
LDB/H/W
MV
MVC
MVK
MVKL
MVKH
MVKLH
STB/H/W

Program Ctrl
B
IDLE
NOP

Note: Refer to the 'C6000 CPU Reference Guide for more details

## C62x Instruction Set (by unit)

.S Unit

.S
.L
.D

AND
B
CLR
EXT
MV
MVC
MVK
MVKL
MVKH

NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO

ABS
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM

NOT
OR
SAT
SSUB
SUB
SUBC
XOR
ZERO

.M Unit
.D Unit

.M

.L Unit

NEG
(B/H/W)
SUB
LDB
(B/H/W) SUBAB (B/H/W)
ZERO
MV

MPY
MPYH
MPYLH
MPYHL

SMPY
SMPYH

No Unit Used
NOP

IDLE

## C67x Superset of Fixed-Point

.S Unit

.S
.L
.D

AND
B
CLR
EXT
MV
MVC
MVK
MVKL
MVKH

NEG
NOT
OR
SET
SHL
SHR
SSHL
SUB
SUB2
XOR
ZERO

ABSSP
ABSDP
CMPGTSP
CMPEQSP
CMPLTSP
CMPGTDP
CMPEQDP
CMPLTDP
RCPSP
RCPDP
RSQRSP
RSQRDP
SPDP

.D Unit

.M

NEG
(B/H/W)
SUB
LDB
(B/H/W) SUBAB (B/H/W)
LDDW
ZERO
MV

.L Unit
ABS
AND
CMPEQ
CMPGT
CMPLT
LMBD
MV
NEG
NORM

NOT
OR
SAT
SSUB
SUB
SUBC
XOR
ZERO

SUBSP
SUBDP
INTSP
INTDP
SPINT
DPINT
SPRTUNC
DPTRUNC
DPSP

.M Unit
MPY
MPYH
MPYLH
MPYHL

SMPY
SMPYH

MPYSP
MPYDP
MPYI
MPYID

No Unit Used
NOP

IDLE

## C67x+ CPU Core Enhancements

CPU Enhancements
Number of registers doubled to 64
Cross-path operand sourcing ability doubled to 2
Execution Packets can now Span Fetch Packets (for better code size!)
All changes are backwards compatible to 67x CPU

New Instructions
.S Units enhanced with FP Adder
SUBSP
SUBDP
Along with .L unit, you can have

## .M Units enhanced with mixed

precision multiply instructions
MPYSPDP SP x DP into DP
MPYSP2DP SP x SP into DP
Many apps may benefit from these
mixed precision floating point mpys
These provide faster alternatives to
the full double precision MPYDP

Outline
CPU Architecture
Instruction Set Overview
Classic C6x Devices (C62x, C67x)
Introducing SIMD (C64x)
Brand New (C64x+, C674x, C66x)

## Internal Buses & Memory

C6000 Peripherals Overview
Device Family Review
Exam 1

Instruction Fetch
Instruction Dispatch

Emulation

Packing

Emulation

Instruction Decode

## Registers (B16 - B31)

L1

S1

+
+

+
+
+

Interrupt
Control

Control Registers

+
+

M1
x
x
x
x

D1

D2

M2
X

x
x
x
x

S2

L2

+
+

+
+

+
+
+

## Doubled size of register set

Packed Data Processing - Dual 16-bit (4000 MMACs) or
- Quad 8-bit (8000 MMACs) which is great for imaging applications
Increased code density
100% object code compatible with C62x

## 'C64x: Superset of C62x

.S

.D

Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
Bitwise Logical UNPKHU4
ANDN
UNPKLU4
Shifts & Merge SWAP2
SPACK2
SHR2
SPACKU4
SHRU2
SHLMB
SHRMB
Dual Arithmetic Mem Access
LDDW
SUB2
LDNW
LDNDW
Bitwise Logical STDW
AND
STNW
ANDN
STNDW
OR
XOR
MVK (5-bit)

Compares
CMPEQ2
CMPEQ4
CMPGT2
CMPGT4

.L

Branches/PC
BDEC
BPOS
BNOP

ABS2
MAX
MIN
SUB2
SUB4
SUBABS4
Bitwise Logical
ANDN

.M
Average
AVG2
AVG4
Shifts
ROTL
SSHVL
SSHVR

Data Pack/Un
PACK2
PACKH2
PACKLH2
PACKHL2
PACKH4
PACKL4
UNPKHU4
UNPKLU4
SWAP2/4

Multiplies
MPYHI
Shift & Merge
MPYLI
SHLMB
MPYHIR
SHRMB
MPYLIR
MPY2
MVK (5-bit)
SMPY2
Bit Operations DOTP2
DOTPN2
BITC4
DOTPRSU2
BITR
DOTPNRSU2
DEAL
DOTPU4
SHFL
DOTPSU4
Move
GMPY4
MVD
XPND2/4