Exam Computer Architecture (INF559) Ecole Polytechnique

(1)

Exam Computer Architecture (INF559) Ecole Polytechnique

2008–2009

– The exam lasts 3 hours.

– The text contains 5 pages, including a summary on the LC-2 at the end.

– All documents are authorized.

– The grades attributed to each section are only there to help you weigh the questions, they will not necessarily correspond to the final total.

– It is compulsory that you comment your programs ; almost each line/instruction must have a comment which enables to understand their role.

As transistors get smaller they also get more prone to errors. As a result, within any gate, a bit may be flipped : it may switch from 0 to 1 or vice-versa. Consequently, processor designers must now take this reliability issue into account. In this text, we investigate such errors, and how they can be coped with at the hardware and the software levels.

This text contains 3 sections which can be treated independently, though we recommend to treat them in order.

Exercice 1 - Error detection and correction (7 points)

In this part we build circuits capable of detecting and correcting errors (bit flips).

Question 1.1

We consider a word of 4 bits b₃b₂b₁b₀. We want to build a circuit where the output is 1 if the number of bits equal to 1 is odd, and 0 if the number of bits equal to 1 is even. Present your solution first as a 4-entry Karnaugh map, where the columns correspond to b₃b₂and the rows correspond to b₁b₀.

Using one of the logical operators seen in the course, the logical expression of this circuit can be made very simple. Indicate what is this operator and the corresponding logical expression.

Réponse

The Karnaugh map is the following :

0 1 0 1

1 0 1 0

0 1 0 1

1 0 1 0

The logical operator is XOR, and the corresponding logical expression is b3⊕b2⊕b1⊕b0.

Question 1.2

We call this bit p (for parity), and now consider that such a bit is added to each 4-bit word b₃b₂b₁b₀resulting into 5-bit words pb₃b₂b₁b₀. Explain how this bit can be used for error detection purposes. How many simultaneously occurring errors can be detected ? Is it possible to correct the error(s) as well ?

Réponse

In order to detect errors, the parity of the number of bits equal to 1 among b3b2b1b0is computed and compared to p. If they differ, then an error has occurred.

Only an odd number of simultaneous bit flips can be detected. Usually this mechanism is used to detect 1 error. It is not possible to correct the error, we only know that an error occurred, but not on which bit it occurred.

Question 1.3

We now consider a more complex error detection scheme where we add the following three parity bits : p₀=parity(b1b₂b₃)

p₁=parity(b0b₂b₃)

1

(2)

2 p2=parity(b0b1b3)

Therefore, we now have a 7-bit word p₂p₁p₀b₃b₂b₁b₀instead of the 4-bit word b₃b₂b₁b₀. How many errors is it possible to detect with this scheme ? Provide a formal justification of your reply.

Réponse

Two errors can be detected.

Consider first that one of the two flipped bits is b3. If the other flipped bit is bi(i6=3), then piwill differ. If the other flipped bit is a pibit, then the error is directly detected when comparing parity bits.

Consider now that the two flipped bits are bi₁and bi₂, with i16=3and i26=3. Then pj/j=i1|j=i2will both be flipped since they cover only one of the flipped bits.

Finally if at least one of the two flipped bits is a pjbit, then the error is directly detected when comparing the parity bits.

Question 1.4

Provide a logical expression (as a function of p₂,p₁,p₀,b3,b₂,b1,b₀) which is equal to 1 (true) if there is an error, and 0 otherwise. Do not try to minimize the cost of the corresponding circuit.

Réponse

Let us call p()the parity function of question 1.1. Two simultaneous errors occur if :

((p(b0,b1,b3)6=p2) + (p(b0,b2,b3)6=p1) + (p(b1,b2,b3)6=p0))

The above expression becomes a combinational logic expression by substituting6=with the negation of XOR (⊕).

Question 1.5

Formally prove that this error detection scheme can also be used to correct one error. Explain how. Also give a counter-example showing that it cannot always correct two simultaneously occurring errors.

Réponse

We assume only one error occurred. In that case, either 1, 2 or 3 parity bits have been flipped.

If only one p_idoes not match, then that parity bit has been flipped by the error, there is no correction necessary on the data bits bj.

If pi₁and pi₂do not match, then bi₃/i₃6=i₁∧i₃6=i₂∧i₃6=3has been flipped.

If p0,p1,p2do not match, then b3has been flipped.

As a result, all cases where a single bit has been flipped can be corrected.

However, two errors cannot be corrected. For instance, if b₀and b₁have been flipped, then p₁and p₀do not match ; p₂ stays unchanged because two of its data bits have been flipped. As a result, this is wrongly interpreted as the case where a single error occurred on bit b₂.

Question 1.6

Based on your answer to the previous question, provide the logical expression of an error correction circuit.

The input of this circuit are the 7 bits, and the output are c₃c₂c₁c₀the corrected versions of b₃b₂b₁b₀. Do not try to minimize the cost of this circuit.

Réponse

c₃=b₃×p₀×p₁×p₂+b₃×p₀×p₁×p₂ c2=b2×p0×p1+b2×p0×p1

c₁=b₁×p₀×p₂+b₁×p₀×p₂ c₀=b₀×p₁×p₂+b₀×p₁×p₂

Exercice 2 - A fault-tolerant processor (7 points)

In this part, we try to adapt the LC-2 so that it can keep functioning even when errors occur. We use the single- bit (parity) error detection scheme of Question 1.1. ¹We assume all registers in the circuit (IR, PC, MDR, MAR, NZP, BEN, R₀, . . . ,R₇) have been augmented with a parity bit, and that they are immediately followed by the error detection circuit of Question 1.1. Therefore, there are now error signals E_IR,E_PC, . . .which indicate if an error was detected after each of these registers.

Question 2.1

The control circuit detects that an error occurs by doing an OR of all the error signals. When an error occurs, the instruction is “squashed”, i.e., it is stopped and restarted. During the execution of an instruction, when is it no longer possible to squash it ? You may want to detail your reply per instruction, or at least per instruction category.

We will assume that errors which occur after an instruction is no longer squashable are simply not recoverable.

Réponse

The state of the processor must not have changed. That means :

1You can do this exercise without having done that circuit, you simply must read the text of that question.

(3)

3 – PC has not been modified (all instructions, and branch instructions during their execution).

– The instruction has not modified a register Ri(for the instructions which write a result).

– The instruction has not written to memory (for store instructions).

Question 2.2

What do you suggest to modify in the following control sequence of instruction ADD so that it remains squashable for the longest possible time ?

MAR←PC,PC←PC+1 MDR←MAR IR←MDR NZP,DR←SR1+SR2

Réponse

The PC must now be updated at the very end of the instruction control sequence, instead of at the beginning.

Question 2.3

The two inputs SR1 and SR2 of the above ADD instruction correspond to registers which contain 16 data bits and 1 parity bit. An error signal E_ADDindicates if an error occurred during an addition. What check should be done right after the ALU to detect if an error has occurred during the addition ? Clearly justify your reply and deduce the logical expression of signal EADD.

Réponse

When adding two numbers, the parity of the result is the XOR of the parity of each number : if the two numbers are even, or odd, the result is even (0) ; if one number is odd and the other one is even, the result is odd (1).

Therefore, if E_ADD= (p(SR1)⊕p(SR2))6=p(SR1+SR2) = (p(SR1)⊕p(SR2))⊕p(SR1+SR2)is equal to 1 (true) then an error occurred during the computation.

Question 2.4

Same question for instruction NOT, with signal E_NOT.

Réponse

When the word size is an even number of bits (like in the LC-2, i.e., 16 bits), the parity of the result should be the same as the parity of the input.

E_NOT=p(SR1)6=p(SR1).

Question 2.5

Same question for instruction AND, with signal E_AND.

Indication : On conseille de considérer d’abord le nombre de 1s plutôt que directement le bit de parité.

Réponse

This question is significantly more difficult than for ADD and NOT. Recall that the parity bit is related to the number of 1s. So one solution is to find how the number of 1s evolves when doing the AND of two numbers.

Let us call one(X)the number of 1s in binary number X . We are going to show that one(AND(SR1,SR2))+one(OR(SR1,SR2)) = one(SR1)+one(SR2). The bit positions where only one of the SR1 and SR2 bit is equal to 1 is counted once in one(OR(SR1,SR2); as a result, for these bit positions, one(OR(SR1,SR2)) =one(SR1) +one(SR2); since the AND of these bits is 0, then one(AND(SR1,SR2)) +one(OR(SR1,SR2)) =one(SR1) +one(SR2)for these bit positions. The bit positions where both bits are equal to 0 are not counted in one(AND(SR1,SR2))nor in one(OR(SR1,SR2)), so one(AND(SR1,SR2))+one(OR(SR1,SR2)) = one(SR1)+one(SR2)for these bit positions. Finally, the bit positions where both bits are 1 are counted into both one(AND(SR1,SR2)) and one(OR(SR1,SR2)), so one(AND(SR1,SR2)) +one(OR(SR1,SR2)) =one(SR1) +one(SR2)for these bit positions. As a result, we indeed have one(AND(SR1,SR2)) +one(OR(SR1,SR2)) =one(SR1) +one(SR2).

Or, one(AND(SR1,SR2)) =one(SR1)+one(SR2)−one(OR(SR1,SR2)). Therefore, EOR=p(SR1)+p(SR2)−p(OR(SR1,SR2)).

Question 2.6

In some cases, the errors are not occasional, they are permanent because a given wire, or a given transistor has a defect. As a result, one of the bits (i) in a register, or one of the bits in the ALU operators, is always erroneous.

Instead of considering the processor has become useless, it is possible to deactivate bit i throughout the processor, and instead of having a 16-bit processor, we would now have a degraded, but still useful, 15-bit processor.

For the remaining questions, we assume there is no parity bit, but we are going to modify the processor architecture so that it can cope with such permanent defects. Let us assume that, somewhere in the processor (either in a register or in an operator), a single bit i,06i615, is no longer usable.

We first focus on the register bank. How the register bank should be modified to accomodate such a faulty bit ?

Réponse

There is no modification required in the register bank itself, since there is no issue with storing and propagating a faulty bit.

(4)

4

Question 2.7

Same question for the NOT operator of the ALU ?

Réponse

No modification required, except for the NZP registers. The logic circuit which computes the NZP conditions should not factor in the faulty bit. We introduce 16 individual deactivation signals DNZPi(1 if bit is deactivated).

For the N circuit, if the 16th bith is the faulty one it should be replaced by the 15th bit. So Nnew=DNZP₁₆×N(ALU₁₆) + DNZP16×N(ALU15).

For the Z circuit, we need to AND each of the 16 input bits coming from the bus with the corresponding deactivation bit.

So Z_new=NOR(AND(DNZPi,ALU_i)).

The P circuit remains unchanged.

Question 2.8

Same question for the AND operator of the ALU ?

Réponse

No modification required beyond the NZP modification.

Question 2.9

Same question for the ADD operator of the ALU ?

Réponse

The carry propagation must become defect tolerant. If bit i has a defect, then adder i should be skipped and the carry out of adder i−1should be directly propagated to the carry in of adder i+1. For that purpose, instead of having cini=couti−1, we now have cin_i=DADD_i−1×couti−1+DADD_i−1×couti−2, where DADD_iis the deactivation signal for adder i.

Exercice 3 - A software approach to fault-tolerance (7 points)

In this section, we make no modification of the architecture, and we attempt to implement fault tolerance at the software level only.

Question 3.1

Write an LC-2 assembly program which computes the number of 1s in a word, and uses it to find the parity bit of a word, see Question 1.1. ²The input word is stored in register R0and the parity bit will be returned in R1.

Réponse

Question 3.2

We now assume that the NOT operator of the ALU has been found to be faulty (it sometimes provides an erroneous result). Consequently, for each invocation of the NOT operator, we want in fact to compare the parity of the input against the parity of the output to make sure they match, and that no fault occurred.

Write a function which implements this secure NOT by using the program of the previous question. The program of the previous question will itself be modified so as to be used as a function.

IMPORTANT : Do not use the stack conventions explained in the course. Use the following simplified con- ventions :

– Register R₆always contains the address of the last used element of the stack.

– Before a function calls another function, it must first back up on the stack all registers it wants to preserve.

It must always assume that any register it needs may be modified.

– A function passes parameters using the stack only.

– At the end of a function call, the called function stores the result to be returned in the last element of the stack.

Réponse

See program at the end of the text.

2You can do this exercise without having done that circuit, you simply must read the text of that question.

(5)

5 .ORIG x3000

LDR R0, R6, #0 ; gets number from stack STR R7, R6, #0 ; push number on stack ADD R6, R6, #1 ; allocate one stack entry STR R0, R6, #0 ; backup number on stack ADD R6, R6, #1 ; allocate one stack entry STR R0, R6, #0 ; push number on stack for call JSR PARITY

LDR R1, R6, #0 ; gets parity from stack ADD R6, R6, #-1 ; deallocate one stack entry LDR R0, R6, #0 ; gets number from stack ADD R6, R6, #-1 ; deallocate one stack entry NOT R0, R0 ; ALU computation

ADD R6, R6, #1 ; allocate one stack entry STR R1, R6, #0 ; backup parity on stack ADD R6, R6, #1 ; allocate one stack entry STR R0, R6, #0 ; push NOT(number) on stack JSR PARITY

LDR R2, R6, #0 ; gets parity of NOT(number) from stack ADD R6, R6, #-1 ; deallocate one stack entry

LDR R1, R6, #0 ; gets parity of number from stack ADD R6, R6, #-1 ; deallocate one stack entry AND R3, R3, #0 ; initialize result

ADD R1, R1, #0 BRz ZERO ; R1 == 0 ADD R2, R2, #0

BRz DIFFER ; R1 != R2 JMP END ; R1 == R2 ZERO ADD R2, R2, #0 BRz END ; R1 == R2 DIFFER ADD R3, R3, #1

END STR R3, R6, #0 ; store result where R7 was stored RET

PARITY LDR R0, R6, #0 ; gets number from stack ADD R6, R6, #-1 ; deallocate number from stack AND R1, R1, #0 ; stores nb of 1s

AND R2, R2, #0

ADD R2, R2, #15 ; counter = 15

LOOP ADD R0, R0, #0 ; most significant bit = 0 ? BRzp POS

ADD R1, R1, #1 ; most significant bit = 1 POS ADD R0, R0, R0 ; shift left

ADD R2, R2, #-1 ; decrement counter BRzp LOOP ; loop if not end

AND R1, R1, #1 ; get parity of result

ADD R6, R6, #1 ; allocate one entry for result STR R1, R6, #0 ; store result on stack

RET STOP HALT .END

(6)

6

LC-2

We provide a summary of the LC-2 instruction set in Figure 2 and of the LC-2 architecture in Figure 1.

+1 2

IR LD.IR

R PCMX LD.PC

MARMX 2

+ ZEXT

ZEXT

@

MDR MAR

R.W

KBDR KBSR

CRTDR CRTSR 2 INMUX

LD.MAR

MIO.EN GateMDR

MEM.EN LD.MDR

Input Output

MIO.EN

GatePC

R7 R0

ALU OUT

SR2

OUT SR1

2 ALUK

GateALU SR1

SR2

SEXT

LD.REG DR

SR1MX DRMX

2

des

bus 16 bits

contrôle adresses mémoire

adressable sur 16 bits

N Z P LD.CC combinatoire combinatoire

BEN LD.BEN SR2MX contrôle microprogrammé BEN

GateMARMX

PC

CarryOut CarryIn

16 [8 :0]

16 [15 :9]

[5 :0]

[7 :0]

16 16

3

3 3

16 16

[4 :0]

[5]

[8 :6]

[11 :9]

[15 :11]

[0 :2]

FIG. 1 – LC-2 architecture Note :PC,IR,MAR,MDR,BEN,N,ZetPare registers.

– LD.MAR_/1, LD.MDR_/1, LD.IR_/1, LD.REG_/1, LD.CC_/1and LD.PC_/1are write controls for LC-2 registers.

– LD.BEN_/1is a write control for the BEN register (branch enable) ; 1 if branch is taken.

– GatePC_/1, GateMDR_/1, GateALU_/1and GateMARMX_/1control bus access.

– MIO.EN_/1set to 1 for memory or I/O access.

– R.W_/1: 0 for read, 1 for write.

– ALUK_/2: 00 forADD, 01 forAND, 10 forNOT, 11 for passing through input 1.

– PCMX_/2: from right to left : 00, 01, 10 et 11.

– MARMX_/2: from left to right : 00, 01 et 10 (11 unused).

– SR1MX_/2: from top to bottom : 00 et 01 (10 and 11 unused).

– DRMX_/2: destination register (signal DR) : IR[11 : 9]for 00, IR[8 : 6]for 01 (10 and 11 unused).

– SR2MX_/1: directly set by bit 5 of instruction, does not need to be controled ; used to differentiate between immediate and register mode for ALU operations.

(7)

7

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

ADD DR, SR1, SR2

(DR ← SR1 + SR2) 0 0 0 1 DR SR1 0 0 0 SR2

ADD DR, SR1, imm5

(DR ← SR1 + SEXT(imm5)) 0 0 0 1 DR SR1 1 imm5 : immediate 5-bits signed AND DR, SR1, SR2

(DR ← AND(SR1, SR2)) 0 1 0 1 DR SR1 0 0 0 SR2

AND DR, SR1, imm5

(DR ← AND(SR1, SEXT(imm5))) 0 1 0 1 DR SR1 1 imm5 : immediate 5-bits signed NOT DR, SR

(DR ← NOT(SR)) 1 0 0 1 DR SR 1 1 1 1 1 1

BRnzp label (PC = PC[15:9]@offset9

ifn.N+z.Z+p.P) 0 0 0 0 n z p offset 9-bits not signed in current page JMP label

(PC = PC[15:9]@offset9) 0 1 0 0 0 0 0 offset 9-bits not signed in current page JSR label

(R7 ← PCand

PC = PC[15:9]@offset9) 0 1 0 0 1 0 0 offset 9-bits not signed in current page JMPR indexed address

(PC = BaseR + ZEXT(offset6)) 1 1 0 0 0 0 0 BaseR index 6-bits not signed JSR indexed address

(R7 ← PCand

PC = BaseR + ZEXT(offset6)) 1 1 0 0 1 0 0 BaseR index 6-bits not signed LEA DR, label

(DR ← PC[15:9]@offset9) 1 1 1 0 DR offset 9-bits not signed in current page LD DR, label

(DR ← MEM(PC[15:9]@offset9)) 0 0 1 0 DR offset 9-bits not signed in current page LDR DR, indexed address

(DR ← MEM(BaseR + ZEXT(offset6))) 0 1 1 0 DR BaseR index 6-bits not signed ST SR, label

(SR → MEM(PC[15:9]@offset9)) 0 0 1 1 SR offset 9-bits not signed in current page STR SR, indexed address

(SR → MEM(BaseR + ZEXT(offset6))) 0 1 1 1 SR BaseR index 6-bits not signed

(PC ← RETR7) 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0

FIG. 2 – LC-2 instructions