Processor Performance - Why Talk About the von Neumann Model?

Why Talk About the von Neumann Model?

3.4 Processor Performance

There are several measures of processor performance, but are all based on the processor’s behavior over a given length of time. One of the most common deﬁ nitions of processor per-formance is a processor’s throughput, the amount of work the CPU completes in a given period of time.

A processor’s execution is ultimately synchronized by an external system or master clock, located on the board. The master clock is simply an oscillator producing a ﬁ xed frequency sequence of regular on/off pulse signals that is usually divided or multiplied within the CPU’s CU (control unit) to generate at least one internal clock signal running at a constant number of clock cycles per second, or clock rate, to control and coordinate the fetching, decoding, and execution of instructions. The CPU’s clock rate is expressed in MHz (megahertz).

Using the clock rate, the CPU’s execution time, which is the total time the processor takes to process some program in seconds per program (total number of bytes), can be calculated.

From the clock rate, the length of time a CPU takes to complete a clock cycle is the inverse of the clock rate (1/clock rate), called the clock period or cycle time and expressed in seconds per cycle. The processor’s clock rate or clock period is usually located in the processor’s speciﬁ ca-tion documentaca-tion.

Looking at the instruction set, the CPI (average number of clock cycles per instruction) can be determined in several ways. One way is to obtain the CPI for each instruction (from the proc-essor’s instruction set manual) and multiply that by the frequency of that instruction, then add up the numbers for the total CPI.

CPI Σ (CPI per instruction * instruction frequency)

At this point the total CPU’s execution time can be determined by:

CPU execution time in seconds per program (total number of instructions per program or instruction count) * (CPI in number

Ch03-H8584.indd 131

Ch03-H8584.indd 131 8/17/07 12:10:53 PM8/17/07 12:10:53 PM

w w w. n e w n e s p r e s s . c o m

of cycle cycles / instruction) * (clock period in seconds per cycle) ((instruction count) * (CPI in number of cycle cycles / instruction)) / (clock rate in MHz)

The processor’s average execution rate, also referred to as throughput or bandwidth, reﬂ ects the amount of work the CPU does in a period of time and is the inverse of the CPU’s execu-tion time:

CPU throughput (in bytes/sec or MB/sec) 1 / CPU execution time CPU performance

Knowing the performance of two architectures (Geode and SA-1100, for example), the spee-dup of one architecture over another can then be calculated as follows:

Performance(Geode) / Performance (SA-1100) Execution Time (SA-1100) / Execution Time (Geode) X

Therefore, Geode is X times faster than SA-1100.

Other deﬁ nitions of performance besides throughput include:

•

A processor’s responsiveness, or latency, which is the length of elapsed time a processor takes to respond to some event

•

^Aprocessor’s availability, which is the amount of time the processor runs normally without failure; reliability, the average time between failures or MTBF (mean time between failures); and recoverability, the average time the CPU takes to recover from failure or mean time to recover (MTTR)

On a ﬁ nal note, a processor’s internal design determines a processor’s clock rate and the CPI;

thus a processor’s performance depends on which ISA is implemented and how the ISA is implemented. For example, architectures that implement Instruction-level Parallelism ISA models have better performance over the application-speciﬁ c and general-purpose based processors due to the parallelism that occurs within these architectures. Performance can be improved because of the actual physical implementations of the ISA within the processor, such as implementing pipelining in the ALU.

Note: There are variations on the full adder that provide additional performance improvements, such as the carry lookahead adder (CLA), carry completion adder, con-ditional sum adder, carry select adder, and so on. In fact, some algorithms that can improve the performance of a processor do so by designing the ALU to be able to proc-ess logical and mathematical instructions at a higher throughput—a technique called pipelining.

Ch03-H8584.indd 132

Ch03-H8584.indd 132 8/17/07 12:10:53 PM8/17/07 12:10:53 PM

Embedded Processors 133

w w w. n e w n e s p r e s s . c o m

The increasing gap between the performance of processors and memory can be improved by cache algorithms that implement instruction and data prefetching (especially algorithms that use branch prediction to reduce stall time) and lockup-free caching. Basically, any design fea-ture that allows for either an increase in the clock rate or a decrease in the CPI will increase the overall performance of a processor.

3.4.1 Benchmarks

One of the most common performance measures used for processors in the embedded market is millions of instructions per seconds, or MIPS.

MIPS Instruction Count / (CPU execution time * 10⁶) Clock Rate / (CPI * 10⁶)

The MIPS performance measure gives the impression that faster processors have higher MIPS values, since part of the MIPS formula is inversely proportional to the CPU’s execution time. However, MIPS can be misleading in terms of this assumption for a number of reasons, including:

•

Instruction complexity and functionality aren’t taken into consideration in the MIPS formula, so MIPS cannot compare the capabilities of processors with different ISAs.

•

MIPS can vary on the same processor running different programs (with varying instruction count and different types of instructions).

Software programs called benchmarks can be run on a processor to measure its performance.

Endnotes

[3.1] “EnCore 400 Embedded Processor Reference Manual,” Revision A, p. 9.

[3.2] “MPC8xx Instruction Set Manual,” Motorola, p. 28.

[3.3] MIPS32™ Architecture for Programmers Volume II: The MIPS32™ Instruction Set, Rev 0.95, MIPS Technologies, p. 91.

[3.4] MPC8xx Instruction Set Manual, Motorola, p. 28.

[3.5] MIPS32™ Architecture for Programmers Volume II: The MIPS32™ Instruction Set, Rev 0.95, MIPS Technologies, pp. 39 and 90.

[3.6] ARM Architecture, Pietikainen, Ville, pp. 12 and 15.

[3.7] Practical Electronics, Scherz, Paul, p. 538.

[3.8] Texas Instruments website: http://focus.ti.com/docs/apps/catalog/resources/

blockdiagram.jhtml?appId178&bdId112.

Ch03-H8584.indd 133

Ch03-H8584.indd 133 8/17/07 12:10:53 PM8/17/07 12:10:53 PM

w w w. n e w n e s p r e s s . c o m

[3.9] “A Highly Integrated MPEG-4 ASIC for SDCAM Application,” Chung-Ta Lee, Jun Zhu, Yi Liu, and Kou-Hu Tzou, p. 4.

[3.10] aJile Systems website: www.ajile.com.

[3.11] National Semiconductor, “Geode User’s Manual,” Rev. 1.

[3.12] Net Silicon “NetARM40 Hardware Reference Guide.”

[3.13] Zoran website: www.zoran.com.

[3.14] Inﬁ neon Technologies website: www.inﬁ neon.com.

[3.15] Philips Semiconductor website: www.semiconductors.philips.com.

[3.16] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.17] National Semiconductor, “Geode User’s Manual,” Rev. 1.

[3.18] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.19] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.20] Practical Electronics, Scherz, Paul.

[3.21] The Electrical Engineering Handbook, Dorf, p. 1742.

[3.22] The Electrical Engineering Handbook, Dorf, p. 1742.

[3.23] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.24] Practical Electronics, Scherz, Paul.

[3.25] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.26] Practical Electronics, Scherz, Paul.

[3.27] Practical Electronics, Scherz, Paul.

[3.28] Practical Electronics, Scherz, Paul.

[3.29] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.30] Practical Electronics, Scherz, Paul, p. 538.

[3.31] Practical Electronics, Scherz, Paul.

[3.32] Practical Electronics, Scherz, Paul.

[3.33] Computer Organization and Programming, Ramm, Dietolf, p. 14.

[3.34] Practical Electronics, Scherz, Paul.

[3.35] Practical Electronics, Scherz, Paul.

Ch03-H8584.indd 134

Ch03-H8584.indd 134 8/17/07 12:10:54 PM8/17/07 12:10:54 PM

Embedded Processors 135

w w w. n e w n e s p r e s s . c o m

[3.36] Practical Electronics, Scherz, Paul.

[3.37] “This RAM, That RAM, Which Is Which?” Robbins, Justin.

[3.38] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.39] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.40] Computers as Components, Wolf, Wayne, p. 206.

[3.41] Embedded Controller Hardware Design, Arnold, Ken, Newnes Press.

[3.42] Freescale, “MPC860 Training Manual.”

[3.43] Embedded Microcomputer Systems, Valvano.

[3.44] Freescale, “MPC860 Training Manual.”

[3.45] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.46] Freescale, “MPC860 Training Manual.”

[3.47] Freescale, “MPC860 Training Manual.”

[3.48] Freescale, “MPC860 PowerQUICC User’s Manual.”

[3.49] Freescale, “MPC860 Training Manual.”

[3.50] Freescale, “MPC860 Training Manual.”

[3.51] Freescale, “MPC860 PowerQUICC User’s Manual.”

Ch03-H8584.indd 135

Ch03-H8584.indd 135 8/17/07 12:10:54 PM8/17/07 12:10:54 PM

w w w. n e w n e s p r e s s . c o m

Dans le document Embedded Hardware (Page 148-154)