Arithmetic and Logic Unit (ALU)
Designing an Adder
Traditional circuit design: truth table approach
Cost/Speed tradeoff:
• n-bit adder: truth table with 2n inputs, 22nrows→fast circuit (theoretically), but very costly
• n 1-bit adders: cheap but slow
• Tradeoff: n 1-bit adders and additional circuitry to speed up computation
+
……….…
+
……….…
+
……… +
1 0 0 1 1
1 0 1 0 1
0 1 0 0 1
1 0 1 1 0
1 1 1 1 1 0 0 0 r
1 1 0 S
0 0 1
0 1 0
0 0 0
R y x
1 0 1 1
1 1 0 S
0 0 1
0 1 0
0 0 0
R y x
1-bit Adder
+ y x
r S R Half Adder
Full Adder R
S
r x y
R = x.y + r.(x+y)
S = r’(x’.y + x.y’) + r.(x.y + x’.y’)=r ⊕x⊕y
0 1 1
1 1 0
⊕
0 1
1 0
0 0
y x
Exclusive OR
= x.y + r.(x⊕y)
x⊕y x
y
n-bit Addition/Substraction
Substraction:
•X-Y = X + (-Y) = X + Y’ + 1
•c = 1 →no additional adder
Bottleneck: carry
propagation
+ S1 r1
+ S0 r0 +
S2 r2 +
x1 x0
x2 x3
S3 r3
r0 r1 r2 r3
s0 s1 s2 s3 r3 Z
y0 y1 y2 y3 + Y
x0 x1 x2 x3 X
y1 y0
y2 y3
c c=0: addition
c=1: substraction
y0c y1c
y2c y3c
A Simple 1-bit ALU
Elementary operations:
• ADD, AND, NOT
ALU: Arithmetic and Logic Unit Multiplexor:
select among several inputs
MUX
m0 m1 m2 m3
m c1c0
m3
1 1
m2
m1
m0
m
0 1
1 0
0 0
c0
c1
MUX
ALU c1
c0
y x
r
+
R
1 d 1
NOT AND ADD ALU
0 1
1 0
0 0
c0
c1
n-bits ALU
n 1-bit ALUs
Embryo of instruction set:
n n X Y
n Z
1 d 1
NOT AND ADD ALU
0 1
1 0
0 0
c0
c1
Overflow
Detect overflow in twos-complement ?
0 0 1
1 1 0 1 -5
1 0 1 0 + 5
0 1 1 0 6
0 0 0 1
1 0 1 0 1 5
1 0 0 1 + -7
0 0 1 1 -4
Overflow
Overflow signal Soverflow (1=overflow)
Overflow in twos-complement:
• x ≤0, y ≥0 →no overflow possible
• x ≥0, y ≥0:
Overflow: Z=X+Y (twos-complement) Z > 2n-1-1, et 0 ≤X ≤2n-1-1, 0 ≤Y ≤2n-1-
1
⇒2n-1-1 < Z ≤2n-2
⇒In twos-complement, negative numbers coded in [2n-1;2n-1]
⇒Z is negative
• x ≤0, y ≤0: same; in case of overflow, Z is positive
→overflow detection criterion
0 0 1 1
0 1 0 1
1 0 0 1
1 1 1 0
0 1 1 1 0 0 0 zn-1
0 0 0 Soverflow
0 1
1 0
0 0
yn-1
xn-1
1 '
1 '
1 '
1 1
1. −. − −. −. −
− +
= n n n n n n
overflow x y z x y z
S
Overflow
Action upon overflow ? Several solutions:
• Stop program (TRAP)
Example: MIPS R3000;
• Raise a signaling bit such as Soverflow
Example: Intel x86, Sun SPARC;
• Do nothing
Example: using C on a Sun SPARC 32-bits
-2147483648 (-231) →2147483647 (231-1)
01110111001101011001010000000000 + 01110111001101011001010000000000 --- 11101110011010110010100000000000
Génération de la retenue Propagation de la retenue
Speeding Up Carry Propagation
+ +
+ +
+ +
+ +
3 3 3
2 2 2
1 1
1 0 1 0 1
1 0 1 1
0 0
0 0 0 0 0
. . .
) . .(
) . (
. .
) .(
) . (
P c G r
P c G r
P c G
p p c p g g
p r g r
p c g
y x c y x r
+
= +
= +
=
+ +
= +
= +
=
⊕ +
=
S (t=tr+2) r (t=tr) x,y (t=0) x,y (t=0)
p,g (t=2) r4k-1(t=tr4k-1)
r4k,4k+1,4k+2,4k+3(t=max(tr4k-1+2,4)) tr3=4 tr7=6
tr7=16 tr3=8
Carry Look Ahead r0 r1 r2 p3g3
r3
CLA
p2g2 p1g1 p0g0
p7g7 p6g6 p5g5 p4g4
c
p0g0 c
Speeding Up Carry Propagation
n
= number of bits
CLA: O(n).
Tree structure:
O(log2n).
pi= xi
⊕
yi gi=xi.yi si= pi⊕
ri-1p1g1 p2g2 p3g3
x3y3 x2y2 x1y1 x0y0
a2,b2
b2+b1.a2 a1,b1
a1.a2
a,b r
b + a.r
r1
r3
p2
s2
p0
s0 p1
s1 p3
s3
P2..1=p2p1 G2..1=g2+p2g1
r2
r0=g0+p0c
Speeding Up Carry Propagation
n=16.
Brent-Kung.
CLA: 5 steps (pgand 4 CLA) to propagate carry
Tree: 7 steps (pgand 6 PG operators) Starting with
32 bits, tree (8) faster than CLA (9)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 c
Speeding Up Carry Propagation
n=16
Han-Carlson.
Cost/Perform ance tradeoff
Itanium:
•64 bits,
•0,18µm,
•482 ps for one addition
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 c