How to go to your page

(1)

(2)

For example, to go to page 5 of Chapter 1, type 1-5 in the "page #" box at the top of the

screen and click "Go." To go to page 5 of Chapter 2, type 2-5… and so forth.

(3)

The

Computer Engineering Handbook

Second Edition

Edited by

Vojin G. Oklobdzija

Digital Design and Fabrication

Digital Systems and Applications

(4)

Computer Engineering Series Series Editor: Vojin G. Oklobdzija

Coding and Signal Processing for Magnetic Recording Systems

Edited by Bane Vasic and Erozan M. Kurtas

The Computer Engineering Handbook Second Edition

Edited by Vojin G. Oklobdzija

Digital Image Sequence Processing, Compression, and Analysis

Edited by Todd R. Reed Low-Power Electronics Design

Edited by Christian Piguet

(5)

DIGITAL SYSTEMS AND APPLICATIONS

Edited by

Vojin G. Oklobdzija

University of Texas

(6)

CRC Press

Taylor & Francis Group

6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742

CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works

Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1

International Standard Book Number-13: 978-0-8493-8619-0 (Hardcover)

This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use.

No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://

www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data Digital systems and applications / editor, Vojin Oklobdzija.

p. cm.

Includes bibliographical references and index.

ISBN 978-0-8493-8619-0 (alk. paper)

1. Computer engineering--Management. 2. Systems engineering--Management. I. Oklobdzija, Vojin G. II. Title.

TK7885.D56 2008

621.39--dc22 2007023257

Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com

(7)

Preface

Purpose and Background

Computer engineering is a vast field spanning many aspects of hardware and software; thus, it is difficult to cover it in a single book. It is also rapidly changing requiring constant updating as some aspects of it may become obsolete. In this book, we attempt to capture the long-lasting fundamentals as well as the new trends, directions, and developments. This book could easily fill thousands of pages. We are aware that in this book, some areas were not given sufficient attention and some others were not covered at all.

We plan to cover these missing parts as well as more specialized topics in more detail with new books under the computer engineering series and new editions of the current book. We believe that the areas covered by this new edition are covered very well because they are written by specialists, recognized as leading experts in their fields.

Organization

This book deals with systems, architecture, and applications and contains seven sections.

Section I is dedicated to computer architecture and computer system organization, a top-level view.

Several architectural concepts and organizations of computer systems such as superscalar and vector processor, VLIW architecture, servers, parallel systems, as well as new trends in multithreading and multiprocessing, are described. Implementation and performance-enhancing techniques such as branch prediction, register renaming, virtual memory, and system design issues are also addressed. The section ends with a description of performance evaluation measures and techniques, which are the ultimate measure from the user’s point of view.

Section II deals with embedded systems and applications. As the ability to integrate more transistors continues, the chip is turning into a system containing various elements needed to serve a particular application.

Section III describes important digital signal processing applications and low-power implementations.

Section IV deals with communication and networks, followed by Section V, which deals with input and output issues such as circuit implementation aspects, parallel I=O, algorithms, read channel recording, and issues related to read channel and disk drive technology.

Section VI is dedicated to operating systems, which manage the computer system operation and host the application software.

The final section (Section VII) is dedicated to new directions in computing. Given the rapid development of computer systems and their penetration into many new fields and aspects of our everyday life, this section is rich with chapters describing many diverse aspects of computer usage and potentials for use. It describes programmable and reconfigurable computing, media signal processing, processing of

v

(8)

audio signals, Internet, home entertainment, communications, including video-over-mobile network, and data security. This section illustrates deep penetration of computer systems into the consumer’s market enabled by advances in signal processing and embedded applications.

Locating Your Topic

Several avenues are available to access the desired information. A complete table of contents is presented at the front of the book. Each of the sections is preceded with an individual table of contents. Finally, each chapter begins with its own table of contents. Each contributed chapter contains comprehensive references. Some of them contain a ‘‘To Probe Further’’ section, in which a general discussion of various sources such as books, journals, magazines, and periodicals is located. To be in tune with the modern times, some of the authors have also included Web pointers to valuable resources and information. We hope our readers will find this to be appropriate and of much use.

A subject index has been compiled to provide a means of accessing information. It can also be used to locate definitions. The page on which the definition appears for each key defining term is given in the index.

This book is designed to provide answers to most inquiries and to direct inquirers to further sources and references. We trust that it will meet the needs of our readership.

Acknowledgments

The value of this book is based entirely on the work of people who are regarded as top experts in their respective fields, and their excellent contributions. I am grateful to them. They contributed their valuable time without compensation and with the sole motive to provide learning material and help enhance the profession. I would like to thank Saburo Muroga, who provided editorial advice, reviewed the content of the book, made numerous suggestions, and encouraged me. I am indebted to him as well as to other members of the advisory board. I would like to thank my colleague and friend Richard Dorf for asking me to edit this book and trusting me with this project. Kristen Maus worked tirelessly on the first edition of this book and so did Nora Konopka of CRC Press. I am also grateful to the editorial staff of Taylor & Francis, Theresa Delforn and Allison Shatkin in particular, for all the help and hours spent on improving many aspects of this book. I am particularly indebted to Suryakala Arulprakasam and her staff for a superb job of editing, which has substantially improved this book over the previous one.

Vojin G. Oklobdzija Berkeley, California

(9)

Editor

Vojin G. Oklobdzija is a fellow of the Institute of Electrical and Electronics Engineers and distinguished lecturer of the IEEE Solid- State Circuits and IEEE Circuits and Systems Societies. He received his PhD and MSc from the University of California, Los Angeles in 1978 and 1982, as well as a Diplom-Ingenieur (MScEE) from the Electrical Engineering Department, University of Belgrade, Yugoslavia in 1971.

From 1982 to 1991, he was at the IBM T.J. Watson Research Center in New York where he made contributions to the development of RISC architecture and processors. In the course of this work he obtained a patent on register-renaming, which enabled an entire new generation of superscalar processors.

From 1988 to 1990, he was a visiting faculty at the University of California, Berkeley, while on leave from IBM. Since 1991, Professor Oklobdzija has held various consulting positions. He was a consultant to Sun Microsystems Laboratories, AT&T Bell Laboratories, Hitachi Research Laboratories, Fujitsu Laboratories, Samsung, Sony, Silicon Systems=Texas Instruments Inc., and Siemens Corp., where he was also the principal architect of the Siemens=Infineon’s TriCore processor.

In 1996, he incorporated Integration Corp., which delivered several successful processor and encryption processor designs.

Professor Oklobdzija has held various academic appointments, in addition to the one at the University of California. In 1991, as a Fulbright professor, he helped to develop programs at universities in South America. From 1996 to 1998, he taught courses in Silicon Valley through the University of California, Berkeley Extension, and at Hewlett–Packard. He was visiting professor in Korea, EPFL in Switzerland and Sydney, Australia. Currently he is Emeritus Professor at the University of California and Research Professor at the University of Texas at Dallas.

He holds 14 U.S. and 18 international patents in the area of computer architecture and design.

Professor Oklobdzija is a member of the American Association for the Advancement of Science, and the American Association of University Professors.

He serves as associate editor for theIEEE Transactions on Circuits and Systems II;IEEE Micro; andJournal of VLSI Signal Processing;International Symposium on Low-Power Electronics,ISLPED;Computer Arithmetic Symposium,ARITH, and numerous other conference committees. He served as associate editor of theIEEE Transactions on Computers(2001–2005),IEEE Transactions on Very Large Scale of Integration (VLSI) Systems (1995–2003), theISSCC Digital Program Committee(1996–2003), and the firstAsian Solid-State Circuits Conference, A-SSCCin 2005. He was a general chair of the 13th Symposium on Computer Arithmetic in 1997.

He has published over 150 papers in the areas of circuits and technology, computer arithmetic, and computer architecture, and has given over 150 invited talks and short courses in the United States, Europe, Latin America, Australia, China, and Japan.

vii

(10)

(11)

Editorial Board

Krste Asanovic´

University of California at Berkeley Berkeley, California

William Bowhill Intel Corporation

Shrewsbury, Massachusetts Anantha Chandrakasan

Massachusetts Institute of Technology Cambridge, Massachusetts

Hiroshi Iwai

Tokyo Institute of Technology Yokohama, Japan

Saburo Muroga University of Illinois Urbana, Illinois

Kevin J. Nowka

IBM Austin Research Laboratory Austin, Texas

Takayasu Sakurai Tokyo University Tokyo, Japan Alan Smith

Ian Young Intel Corporation Hillsboro, Oregon

ix

(12)

(13)

Contributors

John F. Alexander

University of North Florida Jacksonville, Florida Krste Asanovic´

Ming Au-Yeung

San Francisco State University San Francisco, California Pervez M. Aziz

Agere Systems

Allentown, Pennsylvania Raymond Barrett

University of North Florida Jacksonville, Florida Lejla Batina

Katholieke Universiteit Leuven Leuven, Belgium

Mario Blaum

IBM Almaden Research Center San Jose, California

Pradip Bose

IBM T.J. Watson Research Center Yorktown Heights, New York Don Bouldin

University of Tennessee Knoxville, Tennessee

E. Bozorgzadeh University of California Los Angeles, California Tzi-cker Chiueh

State University of New York at Stony Brook Stony Brook, New York

Adam Dabrowski

Poznan University of Technology Poznan, Poland

Babak Daneshrad University of California Los Angeles, California Miroslav Despotovic´

University of Novi Sad Novi Sad, Yugoslavia Jozo J. Dujmovic´

San Francisco State University San Francisco, California Mohammad Faheemuddin

King Fahd University of Petroleum & Minerals Dhahran, Saudi Arabia

Manoj Franklin University of Maryland College Park, Maryland Matthew Franklin

University of California at Davis Davis, California

xi

(14)

Borko Furht

Florida Atlantic University Boca Raton, Florida Jean-Luc Gaudiot

University of California at Irvine Irvine, California

Ricardo E. Gonzalez Tensilica, Inc.

Santa Clara, California Anna Hac´

University of Hawaii Honolulu, Hawaii Siamack Haghighi Intel Corporation Santa Clara, California Yoshiaki Hagiwara Sony Corporation Tokyo, Japan Ali Ibrahim

Advanced Micro Devices Sunnyvale, California Mohammad Ilyas Florida Atlantic University Boca Raton, Florida Bruce Jacob

University of Maryland College Park, Maryland Lizy Kurian John

University of Texas at Austin Austin, Texas

R. Kastner

University of California Los Angeles, California Ruby Lee

Princeton University Princeton, New Jersey

Worayot Lertniphonphun Georgia Institute of Technology Atlanta, Georgia

Tomasz Marciniak

Poznan University of Technology Poznan, Poland

Brian Marcus

IBM Almaden Research Center San Jose, California

Daniel Martin Infineon

Mountain View, California Binu Mathew

Apple Inc.

Cupertino, California James H. McClellan

Georgia Institute of Technology Atlanta, Georgia

S.O. Memik

University of California Los Angeles, California Milica Mitic´

University of Nisˇ Nisˇ, Serbia John Morris Auckland University Auckland, New Zealand Samiha Mourad Santa Clara University Santa Clara, California Danny F. Newport University of Tennessee Knoxville, Tennessee Garret Okamoto Santa Clara University Santa Clara, California

(15)

Ara Patapoutian Maxtor

Shrewsbury, Massachusetts Gerald G. Pechanek BOPS, Inc.

Chapel Hill, North Carolina Donna Quammen

George Mason University Fairfax, Virginia

Todd R. Reed

University of Hawaii at Manoa Honolulu, Hawaii

Peter Reiher

University of California Los Angeles, California Eric Rotenberg

North Carolina State University Raleigh, North Carolina Abdul H. Sadka University of Surrey Surrey, England Sadiq M. Sait

Kazuo Sakiyama

Katholieke Universiteit Leuven Leuven, Belgium

M. Sarrafzadeh University of California Los Angeles, California Thomas C. Savell

Creative Advanced Technology Center Scotts Valley, California

Necip Sayiner Agere Systems

Allentown, Pennsylvania

Giovanni Seni

Motorola Human Interface Labs Palo Alto, California

Vojin Sˇenk

University of Novi Sad Novi Sad, Yugoslavia Dezso¨ Sima

Budapest Polytechnic Budapest, Hungary Kevin Skadron University of Virginia Charlottesville, Virginia Mark Smotherman Clemson University Clemson, South Carolina Emina Sˇoljanin

Lucent Technologies New Vernon, New Jersey Zoran Stamenkovic´

IHP Gmbh—Innovations for High Performance Microelectronics

Frankfurt (Oder), Germany Mile Stojcˇev

University of Nisˇ Nisˇ, Serbia

Jayashree Subrahmonia

IBM Thomas J. Watson Research Center Yorktown Heights, New York

David Tarjan University of Virginia Charlottesville, Virginia Fred J. Taylor

University of Florida Gainesville, Florida Daniel N. Tomasevich San Francisco State University San Francisco, California

(16)

Jonathan W. Valvano University of Texas at Austin Austin, Texas

Peter J. Varman Rice University Houston, Texas Bane Vasic´

University of Arizona Tucson, Arizona Ingrid Verbauwhede

Katholieke Universiteit Leuven and UCLA Leuven, Belgium

Jeffrey Scott Vitter Purdue University West Lafayette, Indiana Albert Wang

Tensilica, Inc.

Santa Clara, California Alice Wang

Texas Instruments Dallas, Texas

Shoichi Washino Tottori University Tottori City, Japan Wayne Wolf Princeton University Princeton, New Jersey Thucydides Xanthopoulos Cavium Networks

Marlboro, Massachusetts Larry Yaeger

Indiana University Bloomington, Indiana Chik-Kong Ken Yang University of California Los Angeles, California Habib Youssef

(17)

SECTION I Computer Systems and Architecture

1

Computer Architecture and Design

Introduction Jean-Luc Gaudiot... 1-2 1.1 Server Computer Architecture Siamack Haghighi... 1-2 1.2 Very Large Instruction Word Architectures Binu Mathew... 1-12 1.3 Vector Processing Krste Asanovic´... 1-25 1.4 Multithreading, Multiprocessing Manoj Franklin... 1-35 1.5 Survey of Parallel Systems Donna Quammen... 1-51 1.6 Virtual Memory Systems and TLB Structures Bruce Jacob... 1-59 1.7 Architectures for Public-Key Cryptography Lejla Batina, Kazuo Sakiyama,

and Ingrid Verbauwhede... 1-70

2

System Design

2.1 Superscalar Processors Mark Smotherman... 2-1 2.2 Register Renaming Techniques Dezso¨ Sima... 2-10 2.3 Predicting Branches in Computer Programs Kevin Skadron and David Tarjan... 2-38 2.4 Network Processor Architecture Tzi-cker Chiueh... 2-60 2.5 Stream Processors and Their Applications for the Wireless Domain

Binu Mathew and Ali Ibrahim... 2-66

3

Architectures for Low Power Pradip Bose...3-1

4

Performance Evaluation

4.1 Measurement and Modeling of Disk Subsystem Performance

Jozo J. Dujmovic´, Daniel N. Tomasevich, and Ming Au-Yeung... 4-1 4.2 Performance Evaluation: Techniques, Tools, and Benchmarks Lizy Kurian John... 4-21 4.3 Trace Caching and Trace Processors Eric Rotenberg... 4-38

xv

(18)

SECTION II Embedded Applications

5

Embedded Systems-on-Chips Wayne Wolf...5-1

6

Embedded Processor Applications Jonathan W. Valvano...6-1

7

An Overview of SoC Buses Milica Mitic´, Mile Stojcˇev,

and Zoran Stamenkovic´...7-1

SECTION III Signal Processing

8

Digital Signal Processing Fred J. Taylor...8-1

9

DSP Applications Daniel Martin...9-1

10

Digital Filter Design Worayot Lertniphonphun and James H. McClellan...10-1

11

Audio Signal Processing Adam Dabrowski and Tomasz Marciniak...11-1

12

Digital Video Processing Todd R. Reed...12-1

13

Low-Power Digital Signal Processing Alice Wang

and Thucydides Xanthopoulos...13-1

SECTION IV Communications and Networks

14

Communications and Computer Networks Anna Hac´...14-1

SECTION V Input = Output

15

Circuits for High-Performance I=O Chik-Kong Ken Yang...15-1

16

Algorithms and Data Structures in External Memory Jeffrey Scott Vitter...16-1

17

Parallel I=O Systems Peter J. Varman...17-1

(19)

18

A Read Channel for Magnetic Recording

18.1 Recording Physics and Organization of Data on a Disk Bane Vasic´

and Miroslav Despotovic´... 18-2 18.2 Read Channel Architecture Bane Vasic´, Pervez M. Aziz, and Necip Sayiner... 18-11 18.3 Adaptive Equalization and Timing Recovery Pervez M. Aziz... 18-20 18.4 Head Position Sensing in Disk Drives Ara Patapoutian... 18-46 18.5 Modulation Codes for Storage Systems Brian Marcus and Emina Sˇoljanin... 18-55 18.6 Data Detection Miroslav Despotovic´ and Vojin Sˇenk... 18-65 18.7 An Introduction to Error-Correcting Codes Mario Blaum... 18-91

SECTION VI Operating System

19

Distributed Operating Systems Peter Reiher...19-1

SECTION VII New Directions in Computing

20

SPS: A Strategically Programmable System M. Sarrafzadeh,

E. Bozorgzadeh, R. Kastner, and S.O. Memik...20-1

21

Reconfigurable Processors

21.1 Reconfigurable Computing John Morris... 21-1 21.2 Using Configurable Computing Systems Danny F. Newport and Don Bouldin... 21-18 21.3 Xtensa: A Configurable and Extensible Processor Ricardo E. Gonzalez

and Albert Wang... 21-25

22

Roles of Software Technology in Intelligent Transportation Systems

Shoichi Washino...22-1

23

Media Signal Processing

23.1 Instruction Set Architecture for Multimedia Signal Processing Ruby Lee... 23-1 23.2 DSP Platform Architecture for SoC Products Gerald G. Pechanek... 23-35 23.3 Digital Audio Processors for Personal Computer Systems Thomas C. Savell... 23-45 23.4 Modern Approximation Iterative Algorithms and Their Applications in

Computer Engineering Sadiq M. Sait and Habib Youssef... 23-62 23.5 Parallelization of Iterative Heuristics Sadiq M. Sait, Habib Youssef,

and Mohammad Faheemuddin... 23-82

24

Internet Architectures Borko Furht...24-1

25

Microelectronics for Home Entertainment Yoshiaki Hagiwara...25-1

(20)

26

Mobile and Wireless Computing

26.1 Bluetooth—A Cable Replacement and More John F. Alexander

and Raymond Barrett... 26-2 26.2 Signal Processing ASIC Requirements for High-Speed Wireless Data

Communications Babak Daneshrad... 26-8 26.3 Communication System-on-a-Chip Samiha Mourad and Garret Okamoto... 26-16 26.4 Communications and Computer Networks Mohammad Ilyas... 26-27 26.5 Video over Mobile Networks Abdul H. Sadka... 26-39 26.6 Pen-Based User Interfaces—An Applications Overview Giovanni Seni,

Jayashree Subrahmonia, and Larry Yaeger... 26-50 26.7 What Makes a Programmable DSP Processor Special? Ingrid Verbauwhede... 26-72

27

Data Security Matthew Franklin...27-1

Index...I-1

(21)

Computer I

Systems and Architecture

1 Computer Architecture and DesignJean-Luc Gaudiot, Siamack Haghighi, Binu Mathew, Krste Asanovic´, Manoj Franklin, Donna Quammen, Bruce Jacob,

Lejla Batina, Kazuo Sakiyama, and Ingrid Verbauwhede...1-1 Server Computer Architecture ^. Very Large Instruction Word Architectures ^.

Vector Processing ^. Multithreading, Multiprocessing ^. Survey of Parallel Systems ^. Virtual Memory Systems and TLB Structures ^. Architectures for Public-Key Cryptography

2 System DesignMark Smotherman, Dezso¨Sima, Kevin Skadron,

David Tarjan, Tzi-cker Chiueh, Binu Mathew, and Ali Ibrahim ...2-1 Superscalar Processors ^. Register Renaming Techniques ^. Predicting Branches

in Computer Programs ^. Network Processor Architecture ^. Stream Processors and Their Applications for the Wireless Domain

3 Architectures for Low PowerPradip Bose ...3-1 Introduction ^. Fundamentals of Performance and Power: An Architect’s

View ^. A Review of Key Ideas in Power-Aware Microarchitectures ^. Power-Efficient Microarchitecture Paradigms ^. Conclusions

4 Performance EvaluationJozo J. Dujmovic´, Daniel N. Tomasevich, Ming Au-Yeung, Lizy Kurian John, and Eric Rotenberg ...4-1 Measurement and Modeling of Disk Subsystem Performance ^.

Performance Evaluation: Techniques, Tools, and Benchmarks ^. Trace Caching and Trace Processors

I-1

(22)

(23)

Computer 1

Architecture and Design

Jean-Luc Gaudiot

University of California at Irvine

Siamack Haghighi

Intel Corporation

Binu Mathew

Apple Inc.

Krste Asanovic´

University of California at Berkeley

Manoj Franklin

University of Maryland

Donna Quammen

George Mason University

Bruce Jacob

University of Maryland

Lejla Batina

Katholieke Universiteit Leuven

Kazuo Sakiyama

Katholieke Universiteit Leuven

Ingrid Verbauwhede

Katholieke Universiteit Leuven and UCLA

Introduction...1-2 1.1 Server Computer Architecture ...1-2

Introduction ^. Client–Server Computing ^. Server Types ^. Server Deployment Considerations . Server Architecture .

Future Directions

1.2 Very Large Instruction Word Architectures ...1-12

What Is a VLIW Processor? ^. Different Flavors of Parallelism ^. A Brief History of VLIW Processors ^. Defoe: An Example VLIW Architecture ^. Intel Itanium Processor ^. Transmeta Crusoe Processor ^. Scheduling Algorithms for VLIW

1.3 Vector Processing...1-25

Introduction . Data Parallelism. History of Data-Parallel Machines . Basic Vector Register Architecture . Vector Instruction Set Advantages ^. Lanes: Parallel Execution Units^. Vector Register File Organization ^. Traditional Vector Computers versus Microprocessor Multimedia Extensions ^. Memory System Design ^. Future Directions ^. Conclusions

1.4 Multithreading, Multiprocessing...1-35

Introduction ^. Parallel Processing Software Framework ^. Parallel Processing Hardware Framework ^.

Concluding Remarks ^. To Probe Further

1.5 Survey of Parallel Systems ...1-51

Introduction ^. Single Instruction Multiple Processors (SIMD) ^. Multiple Instruction Multiple Data ^. Vector Machines^. Dataflow Machine ^. Out of Order Execution Concept ^. Multithreading ^. Very Long Instruction Word (VLIW) ^. Interconnection Network ^. Conclusion

1.6 Virtual Memory Systems and TLB Structures ...1-59

Virtual Memory, a Third of a Century Later ^. Caching the Process Address Space ^. An Example Page Table Organization ^. Translation Lookaside Buffers: Caching the Page Table

1.7 Architectures for Public-Key Cryptography...1-70

Introduction ^. RSA Algorithm ^. Elliptic Curve Cryptography ^. Architectures Supporting Both RSA and ECC ^. Concluding Remarks

1-1

(24)

Introduction Jean-Luc Gaudiot

It is a truism that computers have become ubiquitous and portable in the modern world: personal digital assistants (PDAs), as well as many various kinds of mobile computing devices are easily available at low cost. This is also true because of the ever-increasing presence of the Wide World Web connectivity.

One should not forget, however, that these life changing applications have only been made possible by the phenomenal advances that have been made in device fabrication and more importantly in the architecting of these individual components into powerful systems.

In the 1980s, advances in computer architecture research were most pronounced on two fronts: on the one hand, new architectural techniques such as RISC made their appearance and revolutionized single processor design and allowed high performance for the single chip microprocessors which first came out as system components in the 1970s. At the same time, large-scale parallel processors became mature and could be used by researchers in many high-end, computationally intensive, scientific applications.

In recent times, the appetite of Internet surfers has been fueling the design of architectures for powerful servers: in Section 1.1 Siamack Haghighi emphasizes the unique requirements of server design and identifies the characteristics of their applications.

In Section 1.2, Binu Matthew describes the very long instruction word (VLIW) processor model, compares it to more traditional approaches of Instruction Level Parallelism extraction, and demon- strates the future of VLIW processors, particularly in the context of multimedia applications.

Similarly, multimedia applications have promoted a dual architectural approach. In Section 1.3, Krste Asanovic traces the ancestry of vector processors to the supercomputers of the 1980s (Crays, Fujitsu, etc.) and describes the modern applications of this architecture model.

Architectures cannot be evaluated independently of the underlying technology. Indeed, nowadays, while deep-submicron design rules for VLSI circuit are allowing increasing numbers of devices on the same chip, techniques of multiprocessing are seeing additional applications in different forms which range from networks of workstations. Portability, all the way to multiprocessing on a chip, is the topic of Section 1.4 by Manoj Franklin.

Taking concurrent processing to the next level, Donna Quammen surveys parallel systems in Section 1.5 including large-scale tightly coupled parallel processors.

Finally, in Section 1.6 Bruce Jacob surveys the concepts underlying virtual memory systems and describes the tremendous advances this approach has undergone since first being proposed in the late 1960s.

1.1 Server Computer Architecture Siamack Haghighi

1.1.1 Introduction

Widespread availability of inexpensive high-performance computers and Internet access have resulted in considerable business productivity improvement and cost savings. Many companies use high-performance computing and networking technologies for highly efficient electronic or e-commerce. As a result, most modern businesses rely on enterprise information technology (IT) computing and communication infrastructure for the backbone of their operation. The cost-savings potential has required many modern companies to fully automate their traditional manual order entry, processing, inventory management, and operations via web-based technologies. Current e-commerce revenue estimates exceed hundred billion dollars in the United States alone.

Availability of low-cost, robust, reliable, and secure IT infrastructure is one of the key drivers of the new Internet-based businesses. Customer usage models and applications affect IT infrastructure

(25)

performance, operation, and cost. The requirements of many modern IT deployments can be cost- effectively met with client–server computing technologies. Although not a new idea, availability of inexpensive high-performance commodity microprocessors, scalable computer architecture, storage, and high-speed networks make client–server computing model an ideal fit for enterprise electronic business data processing or e-commerce. Other client–server computing advantages are shared data storage and back up, improved infrastructure reliability, availability, serviceability, manageability, and cost amortization over large number of client devices and users. Figure 1.1 illustrates an example client–

server architecture computing deployment.

High-performance servers are built from multiple interconnected processors, high-performance memory systems, scalable networking, local storage subsystems, advanced software, and packaging.

This section provides an overview of server architecture design, deployment, and the associated challenges.

1.1.2 Client–Server Computing

Client–server computing was developed to address cost-effective computing and communication capability for multiple users. Clients use variety of devices and terminal types to access shared servers.

Application server

File server

Web server

Internet access E-mail server

Proxy server Compute server

Ethernet router

Ethernet network Database server

Smart phone client

Wireless access point Tablet client

Mobile client

Shared printer Client terminal Client PC Client PC Client PC

Networked storage Ethernet router

Ethernet hub

Ethernet bridge

FIGURE 1.1 Client–server computing infrastructure.

(26)

Hence, users get access to high-performance services economically since infrastructure costs are shared and amortized among many users.

During 1970s, business computing infrastructure consisted of centralized mainframe computers connected to user terminals via networks. Mainframes provided high-performance centralized processing facilities for compute intensive tasks as well as data storage, external network interface, and task management. In 1980s, business computing evolved to distributed model due to advent of low-cost, high-performance personal computers (PCs) fueled by the availability of inexpensive powerful microprocessors. In this architecture, many of the computing tasks previously serviced by the mainframes are performed locally by the PCs. Recently, e-commerce and rapid growth of Internet as the common communication protocol have resulted in another change in business computing infrastructure. World Wide Web (WWW) optimized applications and low-cost computing have facilitated businesses adop- tion of client–server computing architecture. The following are the modern IT infrastructure elements:

. Simple and robust standardized web-based user interfaces

. Support for variety of access devices such as mobiles, desktop computers, personal digital assistants, and smart phones

. Wired and wireless high-speed data and communication networking

. Shared data storage and peripheral connectivity

. Centralized server array (sometimes referred to as server farm) configured and optimized for enterprise applications

1.1.3 Server Types

Modern servers are designed and optimized for low cost, high performance, low maintenance, and, in many cases, specific application usage models. There are a variety of server types, e.g., proxy, application, web cache, compute, communication, security, video, file, and streaming media. A typical server consists of several high-performance CPUs, large centralized or distributed system memory, high-speed local storage subsystem, and network interfaces. Specialization is achieved through selection of elements such as number of CPUs, size, type, and speed of system memory, operating system, number, and speed of network interfaces, local storage subsystem capacity, type, and access speed. As an example, an e-commerce server requires fast network interface, modest system memory, and multiple CPUs for high-throughput transaction processing whereas a file server benefits from large networked storage subsystem and a compute server benefits from many CPUs and large system memory.

Servers also differ based on form factors. Physical size and configuration are important considerations for high-density (high computing capability) server infrastructure deployments because power delivery, thermal cooling, and standardized installation are often the dominant concern factors. While some servers are designed to fit cabinets, others are designed to fit rack mounted enclosures.

In summary, configurability, form factor, scalable hardware, and software are required for optimized high-performance server deployment and operation.

1.1.4 Server Deployment Considerations

In addition to form factor, optimal server deployment and operation requires hardware, software features, and flexible configuration. In this section, some of these aspects are detailed.

1.1.4.1 Server Features

Most servers have reliability, availability, serviceability, and manageability (RASM) features.

1. Reliability: Servers are expected to operate reliably with the ability for manual or automatic diagnosis and isolation of errors and failures. For example, banking and investment brokerage computing facilities require rapid diagnosis and isolation of hardware and software failures.

(27)

Example reliability features are hardware, software error or failure event detection, response mechanisms such as error correction codes (ECC) and error detection codes, transaction integrity checks, checksum, and multiple redundancies. Example event response mechanisms are event logging, failure source isolation, provisioning, and fail-over switching. Desired reliability features are selected based on cost, complexity, and server application usage model considerations as follows:

. Redundant hardware and software (e.g., independent operating system images on multiple server nodes)

. Server network interface and local storage subsystem integrity check mechanisms

. Fault detection, isolation, and mitigation

. ECC memory scrubbing to detect and correct bit errors that may cause system crash due to charged particle–induced errors

. System management software to collect detected errors and isolate faults

. Networked storage systems that use redundant array of independent disks (RAID) technology for data storage integrity assurance

2. Availability: The rapid rise in business reliance on computing infrastructure has resulted in demand for nonstop computing operations. Servers with such capabilities are referred to as high-availability servers. In banking and investment brokerage businesses, even brief service interruptions are detrimental and cost prohibitive. High-availability servers require specially configured deployments such as multiple backup systems, load balancing, and fail-over switching capabilities. Two key metrics for measuring the value and potential cost of high-availability computing are average downtime per year (in seconds) and potential revenue loss due to service interruption. Other high-availability server features are service provisioning, user task isolation and migration, traffic differentiation, dynamic prioritization, ability to quickly detect, and remedy failures. Scheduled maintenance and upgrade of hardware and software elements may also decrease potential for failures and increase server availability.

3. Serviceability: Continuous trouble-free server operation requires routine maintenance, error and failure monitoring, and the ability to quickly fix or replace defective hardware or software components. The mechanisms that provide such facilities are generally referred to as serviceability options. In many cases, the failure source can be isolated to one unit or subsystem, e.g., one dynamic memory module in system memory. Software mechanisms (e.g., real-time diagnostic tools, alerting, and dynamic server configuration) may be used manually or automatically to isolate, disable a faulty unit, swap backup units and prepare for service or faulty unit replacement before an error or failure becomes catastrophic and propagates to entire server or computing facilities. Features that may assist rapid replacement of faulty components are traffic isolation and hot replacement. Plug and play subsystem capabilities also improve ease of service.

Hot replacement allows changing faulty subsystems without the need to power down or reboot the server. Other services such as scheduled downtime to do off-line enhancement may also be necessary.

4. Manageability: Routine and emergency system operation requires management facilities such as

. Server performance monitoring and key application tuning

. Capacity planning for existing and future clients, users, and applications

. Manual or automatic load balancing, distribution, task migration, and scheduling for efficient operation of the enterprise resources and applications

. Special accommodation of circumstances requiring increased alerting and manageability capabilities (e.g., virus protection, intrusion detection, etc.)

. Rapid installation and configuration of new applications and systems (e.g., software upgrade and installation)

. Automatic and preventive operator notification applications and services

(28)

. Mechanisms for rapid recovery from service outages

. Remote or local server management despite faulty server components and errors The following are the other important and desirable deployment features:

5. Scalability: High-performance IT infrastructure can be built in two ways. In one approach, few servers each configured with large number of CPUs and powerful input=output (I=O) capability can be used. Alternatively, large number of servers, each containing a few CPU may be clustered for high-performance computing. A combination of both approaches may also be used.

6. Security: In routine and emergency cases, access to system resources and facilities such as user authentication, intrusion detection, and privileged access may be needed. Cryptographic technologies such as encryption and decryption may also be used to enhance the overall system security. In some cases, cooperation with local and government officials may be required for intrusion detection and prevention.

1.1.4.2 Operation

An important server deployment issue is the form factor and installation requirements. A typical server board contains multiple CPUs, system and peripheral connection bridges, networking, display, and local storage peripherals. In deployments such as data centers, large number of server modules may be housed in racks or cabinets. In dense server deployments, rack or cabinet mounting, operation, maintenance, thermal management, power delivery, and wiring management are major challenges. The proximity of data centers to major customer sites is also important. Other considerations are as follows:

1. Power: A typical server board may consume several hundred watts of power. Providing power to large server racks may be a significant challenge. Power provisioning includes accommodating outages, voltage regulation, power delivery, uninterrupted supply, and, if necessary, battery backup.

2. Thermal: Servers generate large amounts of heat. Large server installations demand planning and accommodation for heat dissipation and cooling. In many cases, thermal dissipation and cooling solution limits server deployment size. Since high-performance server thermal management is a major challenge, many new servers are built from low-power consumption VLSI building blocks.

Development of low-power CPUs and chipsets that lower the need for active cooling can effectively address thermal limitations.

3. Total cost of ownership: An important consideration for enterprise servers is the total cost of ownership (TCO). TCO is a metric used to estimate overall IT infrastructure operational costs such as hardware and software purchases, services, required personnel, and downtimes. In each enterprise deployment, one or more TCO factors may be dominant. For example, in an online investment brokerage server installation, the downtime is a major consideration. In many cases, the downtime costs may easily justify additional backup servers.

4. Server clustering: Many business applications such as manufacturing, financial, health care, and telecommunication require mission-critical servers. Telecommunication billing and banking servers are examples of server clustering. Mission-critical servers may be designed by connecting several servers and providing fail-over switching capabilities. If one server crashes, others can continue operation of key applications. Server clustering may be used to mitigate hardware (server components, storage, networking hardware), operating system, and application software failures. Variety of hardware and software automatic fault detection, isolation, and fail-over switching mechanisms are available and used by various mission-critical server manufacturers.

1.1.5 Server Architecture 1.1.5.1 Hardware Architecture

Even though servers may be built using custom very large scale integration (VLSI) devices, economic considerations necessitate the use of commodity hardware VLSI whenever possible. Figures 1.2 and 1.3

(29)

Centralized system memory

System interconnect

CPU

CPU CPU

CPU Peripheral interconnect

Network interface

Storage subsystem

Miscellaneous peripherals

Cluster interface Peripheral hub

System hub

FIGURE 1.2 Centralized shared memory server architecture.

System interconnect

Peripheral interconnect

CPU

Storage subsystem

Optional shared system memory

CPU

Local distributed

memory

Local distributed

memory Local

distributed memory

Local distributed

memory

CPU

Miscellaneous peripherals

Cluster interface System hub

Network interface

Peripheral hub

FIGURE 1.3 Distributed memory server architecture.

(30)

illustrate shared centralized and distributed memory multiprocessor server architectures. In both cases, the major building block components are central processing unit (CPU), memory, system, and peripheral hubs, interconnects, and peripherals. Each of these components are now be described in more depth.

1.1.5.1.1 CPU

Most servers contain multiple CPUs. For economic reasons, servers use special configurations of commodity desktop or mobile PC CPUs. Server CPUs differ from desktop or mobile PC CPUs due to additional features such as larger on-chip caches, hardware multiprocessing, hardware cache coherency support and high-performance system interconnects. Most current servers have symmetric multiprocessing architecture whereby all CPUs are of the same type and configuration. A server may be built from heterogeneous CPUs. In such architecture, the few used CPU types can be optimized, each for specific application classes.

Server CPUs have fast execution capability and multiple levels of hierarchically organized on-chip cache memory. Fast execution capability is required for high-performance application processing. Large, high-performance on-chip cache memory ensures sustained high-performance CPU operation. Modern CPU architectures have multiple execution cores, each operating at several gigahertz speed and include several megabytes of on-chip high-speed cache memory.

Figure 1.4 illustrates a simplified internal organization of a modern server CPU. Most commodity CPUs have 64-bit addressing capability and can easily accommodate processing of large data set applications, support for many clients, applications, and users. Each CPU processing core contains several arithmetic logic units (ALU), multi-ported register files, floating-point multiply, and sophisti- cated branch prediction and execution units. Most server CPUs execute program instructions out-of- order (OOO) and several operations at a time (super scalar).

High-performance CPU execution rate requires sustained high-bandwidth instruction and data delivery. Caches are useful for high-speed storage and retrieval of frequently used instruction and data. At each level, disjoint or integrated instruction and data caches may be available. Caches are enumerated in increasing order with the lowest level closest to the CPU execution units. In Fig. 1.4, three levels of cache hierarchy are enumerated as L1–L3. As the cache hierarchy level increases, the size is also increased; typically 2–10 times the size of the preceding cache level. Current high-end server microprocessors use 3–4 cache hierarchy levels. Cache organization optimization parameters are capacity, associativity, line size, speed, number of access ports, and line replacement policy. These parameters are determined based on variety of application execution characteristics, performance simulation models and measurements. Current server CPU costs are dominated by the on-chip cache size and optimized for state-of-the-art VLSI processing technology capabilities, circuit design, power consumption, and salient software application characteristics.

The numbers of CPUs used in a server determine the desired server performance, cost, form factor, and thermal and power requirements. Most server designs contain sockets for additional CPUs and scalable computing performance. Architecture and design of high-performance multiprocessor servers are an active area of research.

1.1.5.1.2 Memory

A critical server building block is the system memory. Server systems use several channels of dynamic random access memory (DRAM) modules. The larger the number of independent memory channels, the larger is the total bandwidth available to devices requesting memory access (such as CPUs and peripheral devices).

A server system memory may be centralized or distributed. Centralized memory organization facilitates simple software architecture. Additional memory modules may be added to the central memory array, benefiting the overall system. The main disadvantages of the centralized system memory architecture are memory access contention and latency. Distributed system memory, as shown in Fig. 1.3, enables lower latency memory access if the CPU to local memory access traffic can be localized.

(31)

If a CPU needs to access indirectly attached memory, the request will be routed to the destination CPU via intersystem interconnects and system hubs.

There are several DRAM memory configurations such as single in-line memory module (SIMM) or dual in-line memory module (DIMM). A server system memory may support multiple DRAM types using high-performance open standard or proprietary memory system interface. This type of memory is generally referred to as buffered memory. Most servers have the capacity for several gigabytes of system memory.

For improved reliability and robustness, server system memory includes fault tolerance features such as ECC. Errors can happen for a variety of reasons such as DRAM memory charged particle induced or intra-chip physical layer random errors. One example robustness metric is the number of error bits that

System interconnect

System interface unit ^Miscella-_neous

L3 I/D-cache

L2 I/D-cache

L1 I-cache L1 D-cache

Instruction fetch, decode, issue, schedule

Instruction retirement, data write back Register file

Register file Register file

Arithmetic logic unit (ALU) Arithmetic logic unit (ALU)

Arithmetic logic unit (ALU)

FIGURE 1.4 Internal server CPU organization.

(32)

can be detected and corrected. Low-end servers use single error correct, double error detect (SECDED) ECC. Another robustness feature is the chip kill. Chip kill feature allows isolation and correction of multi-bit faults stemming from failure of a single memory chip.

Since servers have large system memory capacity, high-speed auto-initialization of memory to known values, e.g., during boot time may be beneficial. At runtime, accessed memory may be checked for initialization, providing additional robustness. Other robustness features include memory mirroring, redundant memory-bit steering, and soft-error (charge particle–induced error) scrubbing.

1.1.5.1.3 System Interconnects

For high-performance server operation, the connections between CPUs and the system hub and system to peripheral hubs should be high bandwidth and low latency. Figures 1.2 and 1.3 illustrate example system and peripheral interconnects. Modern servers use point-to-point, pipelined, high-speed interconnects operating at several gigahertz speed. Modern system interconnects are built using state-of-the- art high-speed serial physical layer signaling capable of error detection, correction, and support for cache coherency hardware protocols. System interconnects need to be low cost, low latency, high performance, require inexpensive circuit board design and have low-power consumption. In most cases, some of these requirements are conflicting, hence the need for trade-offs.

1.1.5.1.4 System Hub

The design of low-cost, high-performance multiprocessor server system hub is an engineering trade- off challenge. On one hand, high-performance system hub needs to deliver peak server performance.

On the other hand, system hub needs to be low cost, scalable to multiple CPUs, configurable for various server types and provide variety of connectivity interfaces. In a server, the CPUs compete for low-latency, high-throughput access to shared resources such as system memory. The system hub provides communication mechanisms between the CPUs, system memory, peripheral subsystem, and potentially graphics controllers. The following are some of the considerations for optimum system hub design:

1. Servers with centralized shared memory as in Fig. 1.2, use integrated memory controller system hub. The system hub is capable of supporting multiple memory modules via high-performance links. In distributed shared memory servers as in Fig. 1.3, memory controllers are integrated within each CPU. Closely coupled CPU and memory controller enables low-latency access.

Typical system hub features, used in centralized shared memory servers, are multiple, fast, wide, and independent memory channels and memory interleave support, high-speed pipelined memory and CPU interconnects.

2. Shared system resources, such as memory are accessed by various system devices (CPU, networking, storage, etc.). Hence, if one device (e.g., a CPU) is to have highest performance access to system memory, all other devices also competing for system memory access need to be held off.

For applications that require protracted high-traffic system memory access, many requesting devices may have to stay idle while the highest priority traffic is serviced, potentially causing severe performance and efficiency loss. Accessing shared system resources is a dynamic application and usage model-dependent event. The main memory scheduling and arbitration access policy optimization are determined through extensive computer simulations that include dynamic models of the application, operating system (OS), and hardware components. Real- time application servers require additional mechanisms such as admission control, quality of service (QoS), traffic differentiation, and bandwidth reservation.

3. Many server system hubs do not provide extensive high-performance graphics capabilities. The reason is the limited need for high-performance graphics in server applications.

1.1.5.1.5 Peripheral Hub

In addition to traditional user and system connectivity devices (flash, keyboard controller, mouse, graphics, etc.), many server peripheral subsystems have extensive high-performance I=O capabilities.

(33)

A server’s I=O capability is useful for supporting multiple high-speed network interfaces, storage arrays, and clustering interfaces. Example I=O technologies are peripheral component interface (PCI), PCI-X, and PCI-Express standard interfaces. Many modern peripheral hubs include additional features for server manageability, serviceability, peer-to-peer communication between I=O interfaces, and the ability to isolate, disable, and reroute high-performance I=O traffic.

1.1.5.1.6 Peripherals

Because of extensive computational capabilities, server peripheral subsystems are more extensive than desktop or mobile PCs. Server peripherals include storage, Ethernet network interface controllers (NIC), clustering interface, and archival storage devices. Some or all traditional peripheral elements such as boot flash, keyboard, mouse, and graphics processing capabilities may also be available. Server peripherals provide desired capabilities by supporting proprietary or open standard intersystem interconnect interfaces such as PCI-Express. Following are the examples of peripherals:

1. Data storage and retrieval: Server storage and archival systems may be centralized or distributed.

High-performance, fault tolerant disk-storage access is achieved using RAID technology. Other data-storage and retrieval technologies are network-attached storage (NAS), small computer system interface (SCSI), and fiber channel storage area networking (SAN).

2. Network interface: High-performance servers require high-speed networking interfaces. Most servers include several gigabit or higher speed Ethernet standard interfaces.

3. Clustering interfaces: Variety of proprietary or industry standard interfaces are available to support clustering of multiple servers. Some clustering interfaces are based on switching fabric technologies to enable multiple server node connection. Other proprietary server clustering interfaces are the direct attach interface, optimized for large data transfer capabilities compared to switching fabric interfaces.

4. Miscellaneous peripherals: Servers may include peripherals such as keyboard, mouse, and graphics controller devices. Other special function peripherals such as encryption and decryption accelerators may also be used.

1.1.5.2 Software Architecture

The server software architecture is very different from desktop or mobile PCs due to enterprise requirements. In addition to supporting large number of users, server software are required to be robust, secure, scalable, fault tolerant, and optimized for large database or transaction processing tasks.

In most cases, software architecture closely matches server types, e.g., web and proxy servers.

1.1.5.2.1 Operating System

Server operating systems are optimized for supporting large number of users, applications, CPU utilization, large capacity system memory, extensive I=O, networking, and multiprocessor scheduling.

Clustered servers use independent and potentially different OS for each server node. Internet working issues in such heterogeneous environment requires additional consideration and optimization. Many OS and software programs use advanced caching techniques for high-performance access of large data sets and databases.

1.1.5.2.2 Applications

There are a variety of server applications depending on the server type and the enterprise needs, e.g., database management, transaction processing, and decision support. Most server applications support large number of simultaneous users, provide isolation among users accessing and sharing the same database, contain extensive security, error or fault tolerance features. E-commerce servers have optimized features for high-throughput transaction processing.

There are also many specialized server software applications typically referred to as middleware used for application development and delivery process tools for web and application servers.

(34)

1.1.5.3 Applications Usage Models

Server applications support variety of usage models and states since each customer may use it differently and the applications have multitude of supported operational modes. For example, a user may be updating the database with recent entries while others are accessing the same database. Application usage models and states have specific system demand characteristics. Robust IT infrastructure deployment and operation ensures that the servers, network, and client architecture are performance tuned and optimized for major usage models of interest. Example server configuration parameters are the number of CPUs, system memory type and size, system and peripheral hubs, peripheral storage size, type, and network interface speed. Software tuning mechanisms include optimizing compilers for parallel processing, multithreading, and customizable libraries.

Server application usage models can be characterized in many ways. Some usages require high- bandwidth communication between CPUs, system memory, and storage devices. Other usages require extensive execution processing rate. Some servers, such as web cache, require high-performance network interfaces.

1.1.6 Future Directions

Advanced server design requires enhancements in several areas:

1. Low-cost, high-performance, scalable server system design is an active area of research. Delivering high user-perceived performance while overcoming system deployment limitations such as cost, security, capacity, thermal, and power consumption are example challenges that need to be addressed.

2. New server board material and design technologies that facilitate higher computing density servers at lower cost, lower power consumption, or denser packaging would benefit future high-performance server designs.

3. With advancements in semiconductor processing technology, VLSI feature sizes (line width, transistor size, etc.) are becoming ever smaller. Small geometry devices are more sensitive to charged particle–induced errors and high-speed signaling faults. Hence, for reliable operation, the future server CPUs may need to include internal circuit error detection and correction or other fault tolerance mechanisms. System software and hardware architecture of fault-tolerant CPU servers are active areas of investigation.

4. With advancements in semiconductor processing technology, server CPUs will have ever- increasing processing capabilities. Software and tools that can expose and exploit the increased performance for key enterprise applications are highly desirable. Example requirements are parallelizing compilers, advanced operating systems, development and debug tools, parallel file systems, and high-speed networking.

5. Advancements in new enterprise class usage models are an active research area. Example improvements are enhanced server security, combining real-time, data and transaction processing enterprise applications, and support for more client devices.

6. Scalable and high-performance networking and clustering technologies to interconnect components and servers are also active areas of research.

1.2 Very Large Instruction Word Architectures Binu Mathew

1.2.1 What Is a VLIW Processor?

Recent high-performance processors have depended on instruction-level parallelism (ILP) to achieve high execution speed. ILP processors achieve their high performance by causing multiple operations to execute in parallel, using a combination of compiler and hardware techniques. Very long instruction

(35)

word (VLIW) is one particular style of processor design that tries to achieve high levels of ILP by executing long instruction words composed of multiple operations. The long instruction word called a MultiOp consists of multiple arithmetic, logic, and control operations each of which would probably be an individual operation on a simple RISC processor. The VLIW processor concurrently executes the set of operations within a MultiOp thereby achieving instruction level parallelism. The remainder of this section discusses the technology, history, uses, and the future of such processors.

1.2.2 Different Flavors of Parallelism

Improvements in processor performance come from two main sources: faster semiconductor technology and parallel processing. Parallel processing on multiprocessors, multicomputers, and processor clusters has traditionally involved a high degree of programming effort in mapping an algorithm to a form that can better exploit multiple processors and threads of execution. Such reorganization has often been productively applied, especially for scientific programs. The general-purpose microprocessor industry, on the other hand, has pursued methods of automatically speeding up existing programs without major restructuring effort. This leads to the development of ILP processors that try to speedup program execution by overlapping the execution of multiple instructions from an otherwise sequential program.

A simple processor that fetches and executes one instruction at a time is called a simple scalar processor. A processor with multiple function units has the potential to execute several operations in parallel. If the decision about which operations to execute in an overlapped manner is made at run time by the hardware, it is called a super scalar processor. In a simple scalar processor, a binary program represents a plan of execution. The processor acts as an interpreter that executes the instructions in the program one at a time. From the point of view of a modern super scalar processor, an input program is more like a representation of an algorithm for which several different plans of execution are possible.

Each plan of execution specifies when and on which function unit each instruction from the instruction stream is to be executed.

Different types of ILP processors vary in the manner in which the plan of execution is derived, but it typically involves both the compiler and the hardware. In the current breed of high-performance processors like the Intel Pentium and the MIPS R18000, the compiler tries to expose parallelism to the processor by means of several optimizations. The net result of these optimizations is to place as many independent operations as possible close to each other in the instruction stream. At run time, the processor examines several instructions at a time, analyses the dependences between instructions, and keeps track of the availability of data and hardware resources for each instruction. It tries to schedule each instruction as soon as the data and function units it needs are available. The processor’s decisions are complicated by the fact that memory accesses often have variable latencies that depend on whether a memory access hits in the cache or not. Since such processors decide which function unit should be allocated to which instruction as execution progresses, they are said to be dynamically scheduled. Often, as a further performance improvement, such processors allow later instructions that are independent to execute ahead of an earlier instruction which is waiting for data or resources. In that case, the processor is said to be out-of-order.

Branches are common operations in general-purpose code. On encountering a branch, a processor must decide whether or not to take the branch. If the branch is to be taken, the processor must start fetching instructions from the branch target. To avoid delays due to branches, modern processors try to predict the outcome of branches and execute instructions from beyond the branch. If the processor predicted the branch incorrectly, it may need to undo the effects of any instructions it has already executed beyond the branch. If a super scalar processor uses resources that may otherwise go idle to execute operations the result of which may or may not be used, it is said to be speculative.

Out-of-order speculative execution comes at a significant hardware expense. The complexity and nonscalability of the hardware structures used to implement these features could significantly hinder the

How to go to your page

For example, to go to page 5 of Chapter 1, type 1-5 in the "page #" box at the top of the

screen and click "Go." To go to page 5 of Chapter 2, type 2-5… and so forth.

The

Computer Engineering Handbook

Second Edition

Edited by

Vojin G. Oklobdzija

Digital Design and Fabrication

Digital Systems and Applications

Computer Engineering Series Series Editor: Vojin G. Oklobdzija

Coding and Signal Processing for Magnetic Recording Systems

Edited by Bane Vasic and Erozan M. Kurtas

The Computer Engineering Handbook Second Edition

Edited by Vojin G. Oklobdzija

Digital Image Sequence Processing, Compression, and Analysis

Edited by Todd R. Reed Low-Power Electronics Design

Edited by Christian Piguet

DIGITAL SYSTEMS AND APPLICATIONS

Edited by

Vojin G. Oklobdzija

University of Texas

Preface

Editor

Editorial Board

Contributors

Contents

SECTION I Computer Systems and Architecture

1

2

3

4

SECTION II Embedded Applications

5

6

7

SECTION III Signal Processing

8

9

10

11

12

13

SECTION IV Communications and Networks

14

SECTION V Input = Output

15

16

17

18

SECTION VI Operating System

19

SECTION VII New Directions in Computing

20

21

22

23

24

25

26

27

Computer I

Systems and Architecture

Computer 1

Architecture and Design

Introduction Jean-Luc Gaudiot

1.1 Server Computer Architecture Siamack Haghighi

1.2 Very Large Instruction Word Architectures Binu Mathew