Information, calcul, communication Chapter 8

(1)

Information, calcul, communication Chapter 8

Paolo Ienne, EPFL 20th November 2014

Revision 1.21

(2)

(3)

Chapter 8 How Computers Work

In the previous chapters of this book we have studied abstractly how information can be collected and efficiently processed. In this and the following chapters, we will turn our attention to the possibility of creating machines to execute the abstract algorithms we have designed so far. This figure charts our path for the present chapter:

Our work in this chapter will revolve around fours steps. (1) We will write the algorithms developed in the past chapters in a more formalized way and thus transform them intoprograms. (2) We will then conceive an abstract machine capable of following mechanically the programs thus elaborated. (3) We will make sure that such programs can be stored in binary format, as we have already done in previous chapters for numbers and letters. Then, finally, (4) we will see how one could implement the machine with real electronic components.

1

(4)

The focus of this chapter is on the crucial element that is at the heart of any computer: the processor which executes the succession of instructions composing an algorithm. This is only a piece of what is needed in practice to have a useful system and the following chapters will address some of the other important components:

::::::::::::

Chapter 9will describe the memory system where the processor retrieves data and stores results; we will discuss some fundamental challenges of building such a system.

:::::::::::::

Chapter 10 will address the problem of storage, such as the hard disks where all the information of our computers is kept or the flash cards where smartphones or digital cameras save permanent information, and how to organize the stored data in a way to make it easy to retrieve what is needed. Finally,Chapter 11:::::::::::::will move beyond the boundaries of a single computer and will describe how computers are interconnected and how they exchange data.

The area of Computer Science that we are about to explore in this last part of the book is the one which addresses the construction of real systems using a variety of physical devices, all coming with sets of appealing features and subject to many practical limitations. This is the art of creatively combining components to achieve useful systems displaying the best blend of high performance, low cost, and moderate energy consumption. It is the branch of engineering responsible for the amazing progress in computer systems of the last half a century. It is the domain of Computer Engineering.

8.1 A Simple Language

Let us go back to the notion ofalgorithm, introduced inChapter 1: Algorithms are::::::::::::

a precise and nonambiguous description of the succession of operations which are needed to compute a particular result or find a specific solution to a problem. On one hand, their lack of ambiguity makes them perfectly suitable to instruct a computer about a particular computation that the user desires to perform: computers are just machines, mechanically repeating simple steps, and algorithms naturally include such succession of steps. On the other hand, algorithms in this book are designed to explain specific computational methods to human readers: thus, although unambiguous, the algorithms of the past chapters contain a variety of notations and include many explanations in the form of natural language expressions. To use our algorithms to drive computation in a machine, we will need to greatly simplify the variety of the language down to something extremely elementary.

(5)

8.1. A SIMPLE LANGUAGE 3

8.1.1 An Elementary Algorithm

Throughout this chapter, we will use this algorithm as an example:

Although exteremely simple, it is conceptually similar to those introduced in

::::::::::::

Chapter 1 and it stipulates the operations that one needs to do to compute the sum of the first num integers. This example will help us decide what elements we must add to our new language and we will progressively translate it from its current form into a new language that we will call assembly.

8.1.2 Storing values

The first, elementary observation, is that all our algorithms, in processing the objects they manipulate (numbers or characters, for instance), need to identify some intermediate values. In our example algorithm, for instance, the largest desired integer to be added is identified through the symbol num. Similarly, the sum which develops while the algorithm is executed is called sum. Instead of using arbitrary names in natural language, we will only use a limited number of regular names. For instance, we will user1,r2, r3, r4, etc. and we will call them registers.

Here is the result of our first change in the way we write algorithms:

One can observe that it is not such a big change, but at least we now know exactly which names are used to identify the various values computed during the execution of any of our algorithms: in all algorithms we will use the very samer1,r2,r3,r4, etc. We will see laterwhy this is useful to build our computing machine, but we can notice already now a fact that will be essentially true for all our transformations:

the algorithm is exactly as rigorous as before but our changes made it slightly less

(6)

readable to a human (the namesumis certainly much more telling thanr3). Indeed, we are not doing this for the human readers but to make it possible for a simple machine to understand what we want it to perform.

8.1.3 Computing

The second transformation concerns most of the lines of our algorithms. Many lines are similar to this one: sum ← sum+ new (or r3 ← r3 +r1, as we now write).

These lines express (1) what operation is to be performed, (2) which operands we need, and (3) which register should be assigned now with the result. Without loss of generality, taking advantage of the fact that all typical operations are either unary (complementing 5 into −5, for instance) or binary (such as addition, subtraction, etc.), we will write each operation as a tuple of three or four elements in the following order: (i) a conventional name for the operation (e.g., add for +, subtract for −, etc.), (ii) the name of the register which should be assigned the result, and (iii) the name of the one or two registers representing the operands or the numerical operands themselves. Then, for example, r3 ← r3 +r1 will be rewritten as add r3,r3,r1.

If we apply the transformation to our algorithm, we obtain this:

Again, the transformation does not change things dramatically and certainly does not help us reading the algorithm. Yet, it makes things much more regular, with the transformed lines always starting with the name of an operation (add to add two numbers,copyto copy a number, and many others we have not used here) followed by some names of registers or the numerical values according to a very precise plan.

The list of possible operations may be relatively rich but it is certainly limited: this list will help us define the capabilities of the machine we need to build.

The regularity of the result of this transformation is all more evident if we try to transform less simple assignments. Consider for instance if our algorithm had contained the following line: r1 ← r1 + r2 × r3. The assignment is not a math- ematical function of only two values anymore, but, individually, each operation is.

We need then to proceed step by step and remember the conventions of arithmetic:

(7)

8.1. A SIMPLE LANGUAGE 5 first compute the product and then the addition. Here would be a possible transla- tion in our new format:

multiply r14, r2, r3 add r1,r1, r14

Note that, since we had to split the original assignment (containing two operations) into two lines (each performing exclusively one operation) we had to assign the result of the multiplication to a temporary register r14—that is, to a register that was not used elsewhere in our algorithm and whose value will be completely irrel- evant once the addition is performed. Of course, if the algorithm contained instead r1 ← (r1 +r2)× r3 we would have translated it as follows, instead:

add r14, r1, r2 multiply r1,r14, r3

Clearly, our new language is much less nice to read to us, but is significantly much terser: it does not require any knowledge of rules of precedence in arithmetic, of the meaning of parentheses, etc. Simply, if one performs individual operations line by line, one gets the desired result. In other words, we have split the burden between a translator, who needs to specify the exact set of equivalent steps, and a dumb executor, who knows nothing of such essential details. This economy of expression will come extremely useful later, when we will try to create our computer.

8.1.4 Taking decisions

Our algorithm is mostly rewritten but for an important part: this while construct which specifies that the following indented lines (the two add lines) should be re- peated over and over until a condition is no longer true. This while is an example of a broad class of algorithmic specifications which essentially all say, in one way or another, “do not move on directly with the next step but, possibly, continue the algorithm at a different point.” We can easily see that we can rewrite any of such specifications by saying explicitly where to continue without depending on indentation or other forms of grouping. In our case,

(8)

where we have used the specificationcontinue followed by an arrow to indicate when we should continue execution elsewhere than with the immediately following step.

Where appropriate, we have also prefixed it with the specification if to indicate when such continuation elsewhere is subject to an algorithmic decision.

Now, it is clear that in our quest for a terse and simple notation there is no place for these arrows. More reasonably, we can decide to number each step or line of the algorithm so as to have a unique identifier for each and every one of its steps. This helps, because then we can use these line numbers to express where to continue and get rid of the arrows:

We are now almost done: we simply need to make this a tad more regular and more closely resembling the way we have transformed computation. For this, we can transform each of the remaining lines into a tuple, once again starting with a name of the operation to perform. For instance, in the simplest case, if the operation is simply to continue at line 2, we could simply say jump 2. If the operation is not only to jump to a different line of the algorithm, but to take a decision based on some comparison, we use more complex operations, such asjump_negzto jump only if a particular value is smaller or equal to zero. In this case, the tuple describing the operation would also contain the name of the register(s) to test: jump_negzwill need one register name to test against zero but, for instance, jump_equal would need two registers to compare for equality (e.g., jump_equal r10,r11, 356). If we apply this last trasformation, our algorithm finally becomes

where now all lines are translated into our new format.

(9)

8.1. A SIMPLE LANGUAGE 7

8.1.5 Assembly language

The first part of our task to create a computing machine is completed: we have transformed an abstract algorithm, whose description was formal but complex and targeted to human readers, into aprogram in a new language that we call assembly language or simplyassembly. Here is the overall transformation that our algorithm has undergone:

This terse and regular language has the following features:

• A program is a sequence of numbered lines, each expressing a single operation on a precise number of operands. These lines compose our program and are called instructions.

• The exact operation performed by each instruction is described by a name, such as copy or jump, which is always the first component of an assmbly instruction.

• For most instructions, once they are executed, the algorithm should continue by executing the instruction in the next line. A few instructions break this rule, such asjumpandjump_negzin our example, and instead specify explicitly which instruction should be executed next. They are calledjumps orbranches.

• The values and the data objects that our algorithm manipulates are identified through a set of standard names (r1, r2, r3) which we call registers.

• The operands of an instruction follow the name of the instruction in a predefined order, typical of each operation. For instance, copy has always two operands after it, a destination register and a source (a register or a numerical value), in this precise order; jump_negz has two operands and the first is the register to compare with zero while the second is the line number of the destination of the jump when the comparison is true.

We will now try to conceive a simple, abstract machine that can mechanically follow theseinstructions and execute our programs.

(10)

8.2 An Abstract Processor Executing Assembly

Now that the first step of our roadmap is completed and the program has finally a very regular form, it is time to think of how we can give this to a machine to execute. We will develop in this section an abstract machine, that is, we will be concerned with thearchitecture of the computer: our focus will be on interconnecting components having a very precisely defined functionality so that the whole acts as a dumb executor of our program, simply and reliably obeying what the program says.

At this point, we will not care for if and how such components can be realized in practice (for instance, with electronic transistors): here we will content ourselves to know exactly what each of these components needs to do.

8.2.1 Computing what assembly dictates

So, what are these components that we want to interconnect? Well, for one, we need elements to determine the result of whatever computation our language can express. For instance, since one of the instructions we can write isadd, we need to have a component like this one:

This component takes two numbers and produces a third one. The value of the third one depends on which arithmetic operation is requested, among a limited number of possibilities (e.g., addition, subtraction, multiplication, division, exponentiation).

We see here some of the general characteristics of the components we are going to use: they have a number of inputs (here the two numbers and the selection of the desired operation) and a number of outputs (here a single numeric result); the value(s) produced at the output(s) should be perfectly and unambiguously clear from the definition of the component.

To execute the instructions of our assembly language it is also clear that we need to remember the values we have assigned to each register. A component such as this one would fit the bill:

(11)

8.2. AN ABSTRACT PROCESSOR EXECUTING ASSEMBLY 9 We call this a register file. This one has three ports operating somehow independ- ently: two identical ports are used to read values out of the register file and the third one to write. We will soon see why we have chosen to have two read ports and one write port, but intuitively it is immediately related to the fact that our instructions have at most two operands to read and produce at most one result. The functionality of this component is relatively simple to describe: If we present the name of a register (one of the r1, r2, r3, etc.) on the write port together with a number, the register file must remember that we have assigned that number to that register (1972 to r2, in the example). If we present the name of a register to one of the read ports, the register file must “answer” by showing at the corresponding output what is the last value that has been assigned to that register (in the example of the figure, r1 has been last assigned 372 and r4 the value 47). Again, the component is clearly and precisely defined.

Often we call the values entering and exiting the components signals and it is sometimes useful to distinguish them in data signals when they represent numeric values or other data that our program manipulates and control signals when they represent something else (such as names of registers we wish to access or the operations we want to compute, in our example).

It should be clear that there is a natural way to interconnect these two units for our purpose of executing programs:

If we follow this small circuit from the register file through the arithmetic unit and back to the register file, we see that it looks for the values last assigned to r3 and r1, adds them up, and assigns the result to registerr3. It is exactly what we expect our machine to do when it receves the instruction

add r3, r3, r1.

Thus, this circuit is certainly going to be part of our machine and we will call this part of our processordatapath because the main signal loop that we see in the circuit carries the main data signals of our machine. But where do the control signal arriving from the bottom come from?

(12)

8.2.2 Controlling the computing units

Clearly, our machine will need some component to know what the program is. This may take the following form:

This component takes a signal representing the number of a line of the program that we want to read and responds by presenting at the output the instruction itself. It is therefore a form of memory which can only be read: once someone has “put our program inside”, we can read what the program is by asking for each line of it. Since this memory contains only the instructions of our program, we can call itinstruction memory orprogram memory.

Of course, to read the instructions from memory, we need to be able to say which line we want to read. A natural way to do that is to have a component to remember a single numerical value. This will be another register (similar to the registers in the register file) and will be dedicated to memorize the next line of the program that we want to execute:

Contrary to the register file, here we do not need to tell which register we want to read or write since there is only one. The output indicates what is the last value assigned to the register, and the input is the value that should be assigned next to the register. This register is called many different names: line pointer, instruction pointer, or program counter. In the example above, the line pointer tells us that the next instruction we want to execute is at line 3 and that the instruction we will want to execute right after is at line 4.

It is natural to connect instruction memory and line pointer in this way:

(13)

8.2. AN ABSTRACT PROCESSOR EXECUTING ASSEMBLY 11 Here we have added a new component that does a fairly simple job: it splits the instruction into its parts (remember that we have made sure that each and every instructions is a tuple made of a precise number of elements) and gives each part to whichever component needs it. Clearly, the signals that come out of this new component and go towards the top connect to the datapath seen in the previous section: in the example, they tell the arithmetic unit to perform an addition, and to the register file which registers to read and write. This component in charge of

“dispatching” parts of the instruction is often calleddecoder because it knows how the instruction is built (“coded”) and breaks it apart.

What we have so far is good enough for executing one instruction: (i) the line pointer says which line to execute, (ii) the instruction memory tells the instruction, (iii) the decoder distributes to the register file the information on the registers to read, (iv) again the decoder communicates to the arithmetic unit what to do with the values coming from the register file, and finally (v) the decoder instructs the register file which register should get the result. All is well, but once this is done we need to move to the next line or instruction. In principle this is quite simple:

With this circuit, we always increment the line pointer, thus reading every instruction sequentially. The component marked with a + is an adder and is similar to the arithmetic unit we have used already—except for the fact that this one can only perform a single type of operation (additions) and therefore there is no need of a control signal to tell it what to do. Yet, this circuit to increment continuously the line pointer is not enough: remember that our language also contains instructions such asjumpwhich instruct the executor of the program not to execute the next instruction but instead continue reading at a completely different line in the program.

We need something like this:

That is, we need to add some switch which decides if the next value for the line pointer (that is, the next instruction executed) is either (a) the one following the instruction we are executing or (b) some other instruction. What controls the position

(14)

of the switch? The switch must select the nonconsecutive value if the operation is some type of jump or branch (as opposed to be an instruction likeaddormultiply) and if the condition (in case of an operation such as jump_negz) is true.

Therefore, this is the way we want to assemble all these pieces:

Note that the decoder is in charge of distributing a few more elements than in the last version: Firstly, the nonconsecutive next line to read is another piece that might come from the instruction, when it is present. Secondly, the decoder must control the switch depending on the instruction (“is it a type of jump?”) and on the result of the condition (“is it true?”). The evaluation of the condition is usually the job of the arithmetic unit which considers such tests as special operations (“is the value negative or null?”) and sends the result to the decoder instead of saving it in the register file. In the example, the instruction

jump_negz r1, 6

is being executed and the decoder asks the register file to read only registerr1 and the arithmetic unit to check if the value is negative or null. If the arithmetic unit answers that indeed the value is negative or null, the decoder instructs the switch to assign the line pointer with the value 6 instead of 3. Note that the decoder also instructs the register file not to assign any result to any registers (represented by the leftmost control signal sent to the register file assuming the value -). The decoder gets a little more complex but its function is still pretty straightforward. We now have all we need to read a program and execute it, one instruction at a time.

This is probably a good time to pause and think again why in Section 8.1 we have bothered to rewrite the algorithm in our not-so-friendly assembly language: this is

(15)

8.2. AN ABSTRACT PROCESSOR EXECUTING ASSEMBLY 13

A 3 GHz Processor?

Our processor repeats the simple operations we have described over and over in time: it reads an instruction, “understands” it, executes it, and moves to the next. Somehow, the ticks of an internal clock are used to mark the progress of such sequence: for instance, at every tick of the clock a new instruction is read from the program memory and, before the next tick, the result of the instruction is computed and stored in the register file while the line pointer is updated; on the next tick, the process repeats. Real processors have this clock and it ticks extremely fast:

typical processors for desktop PCs and laptops admit a clock that ticks up to 3 billion times per second (or 3 GHz, as one says) and get instructions from the memory at this rate.

because we absolutely needed to make sure that the decoder would have a simple and clear definition! The couple of examples of this section show that this is indeed the case if we use our assembly language. But imagine if we had left something like

circumference ← 2 × pi ×radius

coming out of the instruction memory: What ispi? Where is it in the register file?

r3 is simply the third register in the register file, but pi?! And which of the two multiplications should be done first? Does it matter? Similar and perhaps worse problems are created by a line such as

while num >0,

as we had in the original algorithm: What should a machine do if the condition is not true? What line should we continue with, since it is not written explicitly anywhere? For a human reader, with a complete view of the algorithm (and not simply reading it line by line) and with a huge amount of prior knowledge (for instance, well aware of arithmetic rules of precedence), none of these questions is really a problem and all answers are unambiguous, but for our “mechanical decoder”

things must be extremely simple and predefined. It is for this reason that we have restricted ourselves in Section 8.1 to the ugly but extremely terse and practical assembly language.

8.2.3 Encoding instructions in binary: machine code

InChapter 4::::::::::::we have seen how to represent every type of information in a binary system, that is, using exclusively two symbols (usually 0 and 1). We will come back in Section 8.3 to discuss why it is a good idea to represent everything with only two symbols, but for now we can stick to the demeanour we have started in::::::::::::Chapter 4 and apply the principle also to our instructions.

(16)

Since we want to store the program in the instruction memory and we assume that physical computers natively can only represent two symbols (again, we will discuss this in Section 8.3), we want to rewrite all our assembly language instructions as a sequence of 0s and 1s. We will not develop here a complete solution but limit to suggest that it is a fairly easy problem if the assembly language is sufficiently restricted.

Firstly, we can notice that all elements we need to express belong to finite sets and thus can be represented through integer numbers: We can limit registers to a dozen or two and, if our machine has only registers r1 through r16, we can identify them with four bits (0000 is r1, 0001 is r2, and so on until 1111 which represents r16). Similarly, we use operation names in our assembly language but there is only a finite small number of possible operations (add, subtract, jump, jump_negz, jump_equal, and so on). Supposing that we allow only 256 distinct operations, we can use 00000000 to mean add, 00000001 for subtract, 00000010 forcopy,10000000 forjump, etc. Also, in our assembly we have line numbers (e.g., injump 6) or constants but these are both easy to encode, for instance as a 16-bit unsigned or signed integer as explained in Chapter 4.::::::::::::

Secondly, we have purposedly designed our instructions as tuples and they ne- cessarily follow a limited number of patterns, each having a different number or type of parameters in the tuple after the name of the operation. For instance, we have six patterns in our program:

Pattern 1 operation register,constant (copy r3, 0)

Pattern 2 operation register,register (copy r2, r3)

Pattern 3 operation register,register, constant (add r1, r1, -1)

Pattern 4 operation register,register, register (add r3, r3, r1)

Pattern 5 operation line (jump 2)

Pattern 6 operation register,line (jump_negz r1, 6)

where on the right we see examples of each pattern taken from our example program.

Assuming that real programs would not need more than sixteen distinct patterns, we could identify the first pattern as 0000, the second as 0001, the one of jump as 0100, etc.

Now, finally, we could encode each instruction by assembling these pieces: we could use always four bits to identify the pattern, eight bits to identify the operation, and then as many bits as needed to put identifiers for all elements required by the pattern one after the other in binary, in the same order as they are defined in the pattern itself. For instance, to say copy r3, 0 we would write

(17)

8.2. AN ABSTRACT PROCESSOR EXECUTING ASSEMBLY 15 0000 00000010 0010 0000000000000000

Pattern 1 copy r3 0

and for jump 2

0100 10000000 0000000000000010

Pattern 5 jump 2.

If we find convenient to have all instructions encoded with the same number of bits, we could pad short instructions with 0s to equalize them. For instance, if all instruction patterns result in less than 32 binary digits, we can pad all instructions with enough 0s to be 32-bit long. Our examples above become

0000 00000010 0010 0000000000000000

Pattern 1 copy r3 0 (padding)

4 digits 8 digits 4 digits 16 digits 0 digits and

0100 10000000 0000000000000010 0000

Pattern 5 jump 2 (padding)

4 digits 8 digits 16 digits 4 digits.

We have made one more transformation to our program:

The resulting form of the program is calledmachine code: the name reflects the fact that it is perfecty obscure to a human (unless he or she spends time to decode it back to assembly language, which is always possible and certainly quite boring but conceptually trivial). We have now completed step 3 of the roadmap presented at the beginning of the chapter and we are almost ready to load the program in the instruction memory.

(18)

8.2.4 Completing our processor

Step 2 of the roadmap is almost complete too, since we now have an abstract machine composed of simple and well-defined components which can execute all our instructions.

We may make a conceptually small but practically significant improvement. Our machine has a very limited number of registers. This is so for a number of reasons;

one of them, for instance, is that the number of available registers determines how many bits we need in the instruction encoding to select the one we wish to read—

and we may want to keep the number of bits relatively small. In practice, we may want to manipulate data objects which are immensely larger than the number of registers we can afford: images made of millions of pixels, databases of billions of entries, etc. For this, we can simply enrich our datapath with one more component, consisting in a second memory devoted to storing large quantities of data:

To make use of this memory, we will need new instructions, which are often called loads and stores. Examples could be

load r5, 2763 and

store 39287, r7.

The first instruction means that register r5 must be assigned with the value last saved in thedata memory at the 2,763th location. The second that memory location 39,287 must be assigned the value last placed inr7. It is pretty clear that these two instructions are sufficient to process large data sets by bringing them into registers a few pieces at a time, computing some partial result, and possibly saving the data back into memory. We will discuss more in::::::::::::Chapter 9 the challenges of creating an efficient data memory capable of memorizing very large datasets.

Finally, our elementary abstract processor is now complete:

(19)

8.3. BUILDING A REAL PROCESSOR WITH TRANSISTORS 17

Intel and ARM

In Section 8.2 we have gone through the process of defining a machine that can execute a particular assembly language that we had developed in Section 8.1. This is essentially what people callInstruction Set Architecture orISA. Clearly, there is not a single way to do this and one could conceive plenty of ISAs that are basically equivalent but have subtle advantages or disadvantages. Indeed, over the last half a century people have designed many thousands of different ISAs and tens or hundreds remain of industrial relevance today. Although, as users of electronic devices, we completely ignore this, there are a couple of cases where most of us are aware of the ISA we use: It is the case of processors designed or produced by Intel Corporation and ARM Limited. Over the years, practically all consumer computers (desktops, laptops, some tablets) and very many large professional computers have evolved to use essentially a single ISA developed in the late ’70s by the company Intel and many time revised and improved ever since.

It is often calledx86 or simplyIntel Architecture. More recently, and since the introduction of the first widespread cell phones in the ’90s, another set of closely related ISAs has become hugely popular, mostly in handheld or battery powered devices, and this is theARM Architecture. It is estimated that at the beginning of this decade (’10s) the number of shipments of ARM processors was about 25 times larger than the number of shipments of Intel processors. Yet, ARM processors usually come in products with a much lower price tag and with tight economic margins, and the revenue of Intel was about 70 times more than the revenue of ARM. ISAs have come and gone from the computing scene, and it is very hard to tell which ISAs will be most popular in the future, but Intel’s and ARM’s are here to stay for quite some time, certainly.

The red dashed line delimits the processor proper, to which are always connected some memories. Unfortunately, it is still only a paper tiger: while conceptually a powerful mechanism to execute our algorithms, this is all a perfectly pointless exercise until we can prove that we can create a real, physical machine acting exactly as our abstract model. This is our next and last challenge in this chapter.

8.3 Building a Real Processor with Transistors

As already mentioned, what we have created and calledprocessor is per se a simple abstract machine made of components fairly simple to describe. Nothing yet points in the direction, for instance, of electronics: we are interested in any possible way

(20)

to create the necessary components and ultimately a real machine to perform the required functionality specified in the last section. People have used mechanical and electromechanical devices in the past, and researchers still actively try to use optics and other physical phenomena to implement efficient computers. The main point is that whatever we have done so far is completely independent from a specific implementation technology, and it is an accident of history if today humanity finds that electronics is by far the most convenient technology to implement computers—it is perfectly possible that in 20–30 years other technologies will prove more interesting.

But, since today essentially each and every computer is an electronic computer, let us focus on this way to build our processor with real components.

Using electronics to build computer depends on an important device developed at the beginning of the 19th century: the electrical battery.

Due to some electrochemical processes (or due to other natural phenomena for other sources of electricity), it presents two terminals with a different electric potential.

Typical inexpensive batteries we buy in shops have one terminal at a potential 1.5 Volt higher than the other—we express this by saying that one terminal has the potential of 0 Volt and the other one +1.5 Volt. Here is where we first start to look at the natural world in binary terms: we do not quite care what the different levels of potentials are, in fact; we care only for the fact that they are two and distinct, and we assign them the two symbols we use in the binary system, 0 and 1.

The other important components of any electrical circuit are wires (made of conductive materials such as metals) and, quite naturally, switches which, in their simplest form, are simply wires that can be interrupted:

The behaviour of aclosed switch is pretty trivial: it behaves like conductive material and propagates the potential which appears on one side to the other side: so, if we apply the potential corresponding to 0 on the left side, we will “have a 0” on the right side. With a 1 on the left, we will have a 1 on the right. The behaviour of

(21)

8.3. BUILDING A REAL PROCESSOR WITH TRANSISTORS 19 an open switch is perhaps less self-evident: if a switch is open, there is nothing to conduct electricity and the isolated end of the switch has simply an undetermined potential; it is definitely not the case that, for instance, if the left side is connected to the potential corresponding to 0 and the switch is open, then we will have a 1 on the right side! This is kind of tricky: if we want to create a world of binary circuits (that is of circuits where every wire has either the potential representing 0 or the potential representing 1), simply opening switches will create trouble because we will have wires in a situation which cannot belong to our system. We will see how to handle this and eschew problems.

8.3.1 Controllable switches: transistors

In the ’40s, a new device was invented based on some peculiar properties of materials called semiconductors, such as silicon: it is the transistor. One can create transistors of two types, like these:

They are often simply calledN and P transistors but more precisely the transistors most used these days have three terminals and are called n-mos and p-mos transistors, with reference with their full names: metal–oxide–semiconductor field-effect transistors.

We do not care much for the exact type of transistors we use, nor for their exact names. What we definitely care for is their behaviour, which is amazingly simple and, we will see, powerful. N-transistors have this behaviour:

They are essentially like a switch placed between two of the terminals. The peculi- arity is that the potential applied to the third terminal controls the position of the switch: if the potential on the controlling terminal is what we have called 1, the switch is closed; open otherwise. Note that the controlling terminal must always be connected to a 0 or a 1, or we will not know if the switch is either open or

(22)

Binary or Not Binary?

We often associatecomputing with the binary representation, that is with the use of only two digits to represent everything. We have done it inChapter 4_:::::::::and we have done it in Section 8.2.3 for converting assembly language into binary numbers. In fact, everything we have described in this book until the present section is independent of the number of digits one wants to employ to represent the information: with few simple modifications we could instead encode everything in a ternary, quaternary, or any other representation and nothing would change significantly. The only reason for a binary representation is technological and related to the electronic implementation of our computers: a battery has two terminals at two different potentials. Creating circuits with controllable switches (transistors) one connects easily to either one or the other potential. Also, the controllable switches themselves can only be either open or close. The choice of electronics to implement computers and the use of transistors as building blocks justifies the use of two statesorsymbols to represent everything in a computer. In some future, perhaps not very near to us, it is perfectly conceivable that other technologies will be developed that naturally use more symbols—if it will happen, the basic ideas developed so far in this book will still be valid.

closed. P-transistors behave practically in the same way, but the state of the switch is inverted:

With a 1 on the input, this switch is open instead, and closed with a 0. N- and P-transistors are said to becomplementary.

8.3.2 The simplest logic circuit: an inverter

We are ready to create our first circuit with a pair of transistors:

Note that one terminal of the top P-transistor is connected to 1 and one terminal of the bottom N-transistor is connected to 0—in other words, the top and bottom terminals of the circuit are connected to the battery, which is the prime source of 0s

(23)

8.3. BUILDING A REAL PROCESSOR WITH TRANSISTORS 21 and 1s. If we connect the input to 0 or 1, alternatively, we can see what this circuit does:

Because the two transistors are complementary and share the same input, one of the two is always closed and the other one is always open, but which one depends on the input. As the figure shows, if the input is 0, the output is connected to the potential representing 1 and, viceversa, if the input is 1, the output is connected to the potential representing 0: it is aninverter and it is a sufficiently important and fundamental circuit that we have a special symbol for it:

Note a very peculiar habit that we will take in many cases now: although clearly our transistors are connected to the battery through the top and bottom terminals, we will not represent that in our symbol which has only the terminals we really care, namely the input and the output. All our components made of transistors will need to be connected to the battery, but we will not show that in most cases.

8.3.3 More complex logic circuits

Our inverter is a very simple circuit: perhaps not immediately very useful but with some important characteristics. One is this fact that the top transistor and the bottom act complementarily: when one opens, the other closes, and viceversa. This solves the problem we had when we were reasoning about switches: the fact that if we simply leave a switch open, we get something which is neither a 0 nor a 1—and we are in trouble. In the inverter, the input wire connected to both controlling inputs and the use of two transistors of opposite type makes sure that never the two switches are open at the same time—if they were, we would have an undetermined output.

By the same token, never both switches are closed, which would imply a direct connection of the two poles of the battery—what we call a short circuit. A short circuit is another form of undeterminate potential (which potential is “stronger”, 0 or 1?! which one wins if they are connected together?!); this situation, if it were

(24)

allowed to happen, would be very bad because it could destroy our circuit (physics tells us that a very large current would pass through the transistors and destroy them).

If we understand the nice property of complementarity of the top and bottom halves of our inverter, we can try to extend the idea to more complex circuits:

This circuit has two inputs A and B and one output (the two terminals called

“input A” are electrically connected and always at the same potential, and the same happens for “input B”; the only reason why the wire connecting them is not shown is to improve readability). As before, we can try to “apply” some inputs and see what we get at the output:

Clearly, only if both inputs A and B are 1, as on the left of the figure, the two top transistors are open and neither can impose the potential 1 on the output. At the same time, when both inputs A and B are 1, the two bottom transistors are both closed and this creates a path between 0 and the output. Again, as with our inverter, the top and the bottom part of the circuit are complementary: if one closes a path to 0 the other is open, and viceversa. When both inputs are 1 is the only case when the output is 0; in all other situation, such as the one represented above on the right, one or both of the top transistors will connect the output to 1 while one or both of the bottom transistors will disconnect the 0 from the output, avoiding a short circuit—that is, avoiding the connection of 1 to 0. If we try all combinations and observe the output, we obtain this table:

(25)

8.3. BUILDING A REAL PROCESSOR WITH TRANSISTORS 23

This type of table is a very handy description of the circuit at hand and is calledtruth table: it is the table which specifies when the output is true (which is a synonym of 1, where false is the same as 0). This particular circuit is an AND circuit with inversion or aNOT AND or, even simpler, a NAND: the output is NOT true (that is, 0) if and only if the first input is true (equal to 1) AND the second input is also true (equal to 1, too). It is not too difficult to invent other circuits now, such as a NOT OR or NOR. They are called logic circuits because these circuits directly implement the operations of propositional logic. And they are easy to compose: if we put an inverter (also called aNOT) after a NAND, we obtain a AND.

Clearly, once we have understood the “trick”, we can generate any sort of circuits.

For instance this one, with three inputs A, B and C, and two outputs X and Y:

It is pretty hard to figure out what this circuit does and whether it has the same complementarity features that we have observed in the previous circuits between the top and the bottom part. But, with patience, one can verify the latter and capture the functionality in the truth table:

(26)

If one looks this carefully, one could notice that the two bits of output represent, as a two-bit binary number, the number of ones at the input. In other words, if one adds the three one-bit numbers at the inputs, the output represents the result as a two-bit number. For instance 1 + 1 + 0 = 2₁₀ = 10₂. This circuit is called a full adder and corresponds to the elementary operation which we make for every digit when we make a paper-and-pencil addition in binary: we add two digits and the carry for the previous position (so, in total three digits) and produce two digits, one of result and one of carry for the next position if the result is larger than 1.

Composing many circuits like this one we can create a larger circuit that adds 32-bit numbers, for instance.

8.3.4 The combinational part of our processor

The previous section suggests that a huge variety of complex circuits can be created by assembling smaller, elementary circuits, always connecting the output of one sub- circuit to the input of another. Indeed, respecting some simple rules, one can create any combinational circuit:

Combinational circuits are an extremely important class of useful circuits. They are defined as those circuits whose outputs depend solely on the inputs: nothing else is needed to compute the output. Our arithmetic unit is a perfect example of a combinational circuit:

It is sufficient to know that the numbers 32 and 24 must be added (the three inputs) to determine unequivocally that the output must be 56.

(27)

8.3. BUILDING A REAL PROCESSOR WITH TRANSISTORS 25 All the components in light green composing our processor are combinational:

the arithmetic unit, the line incrementer, and the decoder distributing signals and information around.

Hopefully, the discussion so far has convinced us that, even if we cannot quite do it ourselves with the knowledge we have, by composing transistors as done in the last section we can create any arbitrary combinational circuit. This is great, but still not enough to create completely our processsor with a battery and a bunch of transistors.

8.3.5 The sequential part of our processor

The problem is that useful components are not only combinational. There is another fundamental class of components which are sequential circuits or, equivalently but perhaps more expressively, stateful circuits:

The definition is exactly the opposite of combinational circuits: the knowledge of all the inputs at a particular point in time is simply not enough to determine the outputs. A perfect example is our register file:

It is not sufficient to know that we want to read r1: to figure out that the correct output is 372 we need to know something of what happened in the past and, more precisely, what value was last assigned tor1. We say that the output of the circuit is

(28)

not only determined by the inputs but also by some internal state (hence “stateful”

circuits) or, equivalently, by some elements of memory which remember the relevant facts of the past. Hence, an adder has no memory: three plus five is eight, no matter what the circuit has computed in the past. On the other hand, a register is almost pure memory: if the output is fourteen, this implies that (1) sometime in the past fourteen has been written in the register and (2) since then nobody has written anything else, different from fourteen, into the register. The bottomline is that we need now to use our transistors to create some qualitatively new circuit which memorizes information.

8.3.6 A strange logic circuit: a memory cell

Let us consider this simple circuit:

One of the rules that we follow when composing components to create complex combinational circuits is never to create loops. This circuit is a perfect violation of the above rule, with two inverters connected back to back. Is this circuit meaningless?

Is it creating electrical problems? No, in fact:

It turns out that if we make the hypothesis that the leftmost wire is at 1, the first inverter produces a 0 on the right wire and the second inverter imposes a 1 on the left wire—which is perfectly compatible with the initial hypothesis. We can also make the opposite hypothesis that the leftmost wire is at 0, and the result is that this situation is also self-consistent. The truly interesting point is that there exist these two possible situations and both are stable—one says that this is a bistable element. It is qualitatively the very same situation as that of an inverted pendulum in mechanics (a pendulum fixed with a hinge at the bottom instead of at the top):

the pendulum can fall right or left and both extreme points are equilibrium points and stable. What makes this situation interesting for us is that we can label these two equilibrium points with our two symbols: one position represents 0 and the other 1; when our new circuits is in one equilibrium, it stays there and remembers one bit of information. This fundamental circuit is called latch.

(29)

8.3. BUILDING A REAL PROCESSOR WITH TRANSISTORS 27 It is very good to have a circuit that once in a specific state, stays in that state.

Yet, for it to be useful, we need to be able to put it in any of the two states as we see fit. For this we need to make the circuit a little more complex:

Now there is a signal to indicate that we want to write a new value in the latch: it is connected to the control terminal of an N-transistor which, as we remember, is a switch that closes when the control is 1. This means that in the situation above, nothing happens and the latch can be in any of the two states (reading a 1 in this case). Now, let us consider that we want to write a 0: the first inverter converts the 0 into a 1, the write signal at 1 closes the switch and brings the 1 from the inverter to the loop. What makes this circuit quite particular is the connection of the left inverter with the bottom right one:

When the leftmost inverter tries to set its output to 1, the bottom right inverter tries to set the same wire to 0. This is indeed one of those short circuits which we wanted to avoid at all price: the potential might be undetermined—a meaningless situation in our binary world where everything must be either a 0 or a 1. Yet, if we think about it, this is a pretty peculiar form of short circuit: suppose that the leftmost inverter is stronger, that is, it manages by brute force to impose a 1 on the wire for a brief fraction of time; then, the top right inverter is going to switch its output to the complement of 1, which is 0. In turn, the bottom right inverter takes this 0 and produces a 1. Ah! This is the value that the left inverter just imposed with brute force, and now, very soon after, the bottom right inverter in fact agrees—and the output has changed. Therefore, this circuit is what we need to implement a 1-bit memory: it is a bistable element and we can force it into any of the two states at will by creating an extremely brief short circuit¹.

Whatever part of the processor we could not build with combinational circuits, we can build now:

1In fact, more complex latches do not even need to create this brief short circuit.

(30)

The register file and the line pointer are clearly memories of a few tens or few hundreds of bits which we can build out of the latches we have just discovered. Not to speak of our two data and instruction memories which will need thousands or millions of latches and little else. We have our electronic processor.

8.4 A Full Processor

The fourth and last step of our roadmap is now completed and we have achieved our goal: we have transformed the original algorithm in some slightly contrived language

Figure 8.1: A Modern Processor Die. The photo shows the silicon die of a recent Intel processor called Xeon E7-8895 V2 and containing around 4.3 billion transistors.

The fifteen identical parts are the fifteen processor cores and the more regular half of each of them is memory.

(31)

8.4. A FULL PROCESSOR 29 that we readily converted into sequences of 0s and 1s. In parallel, we have ideated an abstract machine that can execute this language; the machine is composed of simple and perfectly specified building blocks. Finally, we have used transistors to implement these building blocks. If we now place the program in the instruction memory of our physical device, our processor will execute step by step the program and compute whatever result the original algorithm specified. We have built an electronic computer.

What makes transistors so appealing in building computers is that they have become over the year ridiculously inexpensive: these days, billions of transistors cost only a few tens of Swiss Francs or Euros or US Dollars. Billions of them! If the first simple microprocessor of our history (the Intel 4004 in 1971) had around 2,300 transistors, modern processors such as the one in Figure 8.1 sport more than 4.3 billion transistors. This graph shows the progress of the number of transistors contained in the most complex integrated circuits since the first Intel processor:

Adapted from Wikimedia Commons

Note that the vertical axis is logarithmic and the graph is a straight line—this means that the transistors in a typical circuit double at a fixed rate (see the box on Moore’s Law).

Does that abundance of transistors imply that modern processors are funda- mentally different from what we have achieved in this chapter? Not at all. The differences are more quantitative than qualitative: (1) A processor like the one in

(32)

Moore’s Law

In 1965, Gordon E. Moore, one of the future founders of Intel Corporation, made the observation that the number of transistors in high-end digital integrated circuits (such as processors) appeared to double approximately every two years. This has remained remarkably true since then.

Moore’s Law is nothing like a natural law (like Newton’s Laws of motion, for instance) but it is simply an observation of the effects of the economic forces and of the engineering cleverness at work in the semiconductor business for the last decades. Very roughly, this is what happens:

In this extremely competitive market, companies design new integrated circuits banking on very low economic margins. The cost of the circuits they sell is determined by the engineering cost of designing the new product (fixed, relatively high, and shared among all circuits sold) and by the manufacturing cost of each individual circuit (roughly corresponding to the area of the circuits they sell). Margins being low, the natural way to improve them, once the product is being sold and is successful, is to push technologists to create as soon as possible smaller transistors (traditionally, whose area is half that of the previous generation) andshrink the circuit to the new size: half the amount of silicon for the same product (and price) gives the semiconductor companies a much more comfortable income. Yet, when designing the next product with the new smaller treansistors, to avoid that the fixed costs dominate its expected price, companies try to make it larger and more costly by investing more transistors to increase the performance and by cramming more functionality into it. The result is a new, significantly improved, and commercially appealing product at a price tag similar to the first one and with similarly low margins (to try not to lose market to the competition). And the history repeats with loops of shrinking products down and making them better and larger. This unique phenomenon in human history (that is, an extremely fast exponential growth sustained for several decades) is behind the advances of computing and its increased pervasiveness into human society. Will it last? Certainly not for long, or at least not in the form that we know today, for there are fundamental physical limits to how small a transistor can become (some dimensions of current transistors are already in the order of a few atoms and, simply, cannot get any smaller). How long will it last? Nobody quite knows: predictions of a slowdown repeatedly appeared since the 1970s but engineering ingenuity has so far averted the danger every single time major engineering difficulties came up. Current predictions say that progress can continue until the 2020s at most. Will the showstopper really happen then? What exactly will happen then? Hard to tell....

the figure actually contains 15 processorcores, each corresponding to our single processor. Then, (2) these processors are much more complex that ours, in that they can manipulate and perform complex operations on a variety of data types (e.g., on floating point numbers, on vectors, on pixels of images); operations that would take tens or hundreds of instructions on our processors are performed in one or few instructions using much more complex circuits. Also, (3) the processor manufac- turers have made incredible progress in using the abundance of transistors to speed up execution in weird ways—for instance, by allowing the execution of multiple instructions at once instead of waiting, as we do, for the previous instruction to have completed before ever moving on. And, finally, (4) they have added a lot of memory to surround the processor, so that large problems can be solved efficiently. In fact, the organization of memory around a processor is the next critical issue that we will tackle in the next chapter.

(33)

8.4. A FULL PROCESSOR 31

Exercises

Equivalent Instructions

Consider this sequence of instructions:

1: jump_smallerequal 0,r1, 3

2: subtract r1, 0, r1

The instructionjump_smallerequalcompares the first operand to the second, with each operand either a register or a value. If the first operand is smaller than or equal to the second, it continues the execution with the line indicated by the third one. Of course, subtract subtracts the third operand from the second and assigns the result to the first one (very much asadd, introduced in Section 8.1).

Which of the following sequences are exactly equivalent, functionally, to those above? Note thatjump_smallerhas the same function as jump_smallerequalbut does not jump in case of equality.

a.

2: jump_smallerequal r1, 0, 4

3: subtract r1, 0, r1 b. ^1: jump_smaller 1,r1, 3

2: subtract r1, 0, r1 c. ^1: jump_smaller 0,r1, 3

d. ^1: jump_smallerequal 0, r1, 2

e.

2: jump_smallerequal 0, r1, 4

(Solution: b, c, and e.)

(34)

A Program Snippet

Consider this sequence of instructions:

1: copy r3, r1

2: jump_smallerequal r2, r1, 4

3: copy r3, r2

The instructionjump_smallerequalcompares the first operand to the second, with each operand either a register or a value. If the first operand is smaller than or equal to the second, it continues the execution with the line indicated by the third one.

What does this program do?

a. Computes the absolute value of a number.

b. Exchanges the values stored in two registers.

c. Finds out the minimum between two values.

d. Finds out the minimum between two values, but it has a problem if the two values are negative.

e. Finds out the maximum between two values.

f. Finds out the maximum between two values, but it has a problem if the two values are identical.

g. Nothing of practical use.

(Solution: e.)

(35)

Studying a Program

Consider the following program:

1: copy r5, 0

2: jump_smallerequal r1, r2,7

3: copy r6, r1

4: copy r1, r2

5: copy r2, r6

6: copy r5, 1

8: copy r6, r2

9: copy r2, r3

10: copy r3, r6

11: copy r5, 1

13: copy r6, r3

14: copy r3, r4

15: copy r4, r6

16: copy r5, 1

17: jump_equal r5, 1,1

18: stop

The instructionjump_smallerequalcompares the first operand to the second, with each operand either a register or a value. If the first operand is smaller than or equal to the second, it continues the execution with the line indicated by the third one.

The instructionjump_equal does the same but jumps only if the first two operands are equal.

1. Suppose that, before executing this program, the registers have the following content:

r1 123 r2 473 r3 17 r4 365

Simulate the program execution until the instruction stop. What is the value of these four registers at the end?

2. What is the function of this program? Explain your answer.

(36)

3. What would happen if one were to replace alljump_smallerequalinstructions with jump_smaller instructions (which, of course, jump only when the first operand is strictly smaller than the second)?

(37)

Computing the Sum of Absolute Values

Write an assembly program to compute the sum of the absolute values of two numbers. Use the types of instructions seen in the chapter and in the previous exercises (add, subtract, jump_smallerequal, etc.). Use registers r1 tor9. The two values to add are stored inr1 and r2 at the beginning and, at the end, the resulting sum should be stored inr3.

(38)

Circuits and Truth Tables

Consider this circuit:

What is the truth table of this circuit?

a.

b.

c.

d.

(Solution: b.)