• Aucun résultat trouvé

This page intentionally left blank

N/A
N/A
Protected

Academic year: 2022

Partager "This page intentionally left blank"

Copied!
756
0
0

Texte intégral

(1)
(2)
(3)

Distributed Computing

Principles, Algorithms, and Systems

Distributed computing deals with all forms of computing, information access, and information exchange across multiple processing platforms connected by computer networks. Design of distributed computing systems is a com- plex task. It requires a solid understanding of the design issues and an in-depth understanding of the theoretical and practical aspects of their solu- tions. This comprehensive textbook covers the fundamental principles and models underlying the theory, algorithms, and systems aspects of distributed computing.

Broad and detailed coverage of the theory is balanced with practical systems-related problems such as mutual exclusion, deadlock detection, authentication, and failure recovery. Algorithms are carefully selected, lucidly presented, and described without complex proofs. Simple explanations and illustrations are used to elucidate the algorithms. Emerging topics of signif- icant impact, such as peer-to-peer networks and network security, are also covered.

With state-of-the-art algorithms, numerous illustrations, examples, and homework problems, this textbook is invaluable for advanced undergraduate and graduate students of electrical and computer engineering and computer science. Practitioners in data networking and sensor networks will also find this a valuable resource.

Ajay D. Kshemkalyani is an Associate Professor in the Department of Com- puter Science, at the University of Illinois at Chicago. He was awarded his Ph.D. in Computer and Information Science in 1991 from The Ohio State University. Before moving to academia, he spent several years working on computer networks at IBM Research Triangle Park. In 1999, he received the National Science Foundation’s CAREER Award. He is a Senior Member of the IEEE, and his principal areas of research include distributed computing, algorithms, computer networks, and concurrent systems. He currently serves on the editorial board ofComputer Networks.

Mukesh Singhalis Full Professor and Gartner Group Endowed Chair in Net- work Engineering in the Department of Computer Science at the University of Kentucky. He was awarded his Ph.D. in Computer Science in 1986 from the University of Maryland, College Park. In 2003, he received the IEEE

(4)

Transactions on Computers. He is a Fellow of the IEEE, and his principal areas of research include distributed systems, computer networks, wireless and mobile computing systems, performance evaluation, and computer security.

(5)

Distributed Computing

Principles, Algorithms, and Systems

Ajay D. Kshemkalyani

University of Illinois at Chicago, Chicago

and

Mukesh Singhal

University of Kentucky, Lexington

(6)

Cambridge University Press

The Edinburgh Building, Cambridge CB2 8RU, UK

First published in print format

ISBN-13 978-0-521-87634-6 ISBN-13 978-0-511-39341-9

© Cambridge University Press 2008

2008

Information on this title: www.cambridge.org/9780521876346

This publication is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press.

Cambridge University Press has no responsibility for the persistence or accuracy of urls for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Published in the United States of America by Cambridge University Press, New York www.cambridge.org

eBook (EBL)

hardback

(7)

To my father Shri Digambar and my mother Shrimati Vimala.

Ajay D. Kshemkalyani

To my mother Chandra Prabha Singhal, my father Brij Mohan Singhal, and my daughters Meenakshi, Malvika,

and Priyanka.

Mukesh Singhal

(8)
(9)

Contents

Preface pagexv

1 Introduction 1

1.1 Definition 1

1.2 Relation to computer system components 2

1.3 Motivation 3

1.4 Relation to parallel multiprocessor/multicomputer systems 5 1.5 Message-passing systems versus shared memory systems 13

1.6 Primitives for distributed communication 14

1.7 Synchronous versus asynchronous executions 19

1.8 Design issues and challenges 22

1.9 Selection and coverage of topics 33

1.10 Chapter summary 34

1.11 Exercises 35

1.12 Notes on references 36

References 37

2 A model of distributed computations 39

2.1 A distributed program 39

2.2 A model of distributed executions 40

2.3 Models of communication networks 42

2.4 Global state of a distributed system 43

2.5 Cuts of a distributed computation 45

2.6 Past and future cones of an event 46

2.7 Models of process communications 47

2.8 Chapter summary 48

2.9 Exercises 48

2.10 Notes on references 48

References 49

(10)

3 Logical time 50

3.1 Introduction 50

3.2 A framework for a system of logical clocks 52

3.3 Scalar time 53

3.4 Vector time 55

3.5 Efficient implementations of vector clocks 59

3.6 Jard–Jourdan’s adaptive technique 65

3.7 Matrix time 68

3.8 Virtual time 69

3.9 Physical clock synchronization: NTP 78

3.10 Chapter summary 81

3.11 Exercises 84

3.12 Notes on references 84

References 84

4 Global state and snapshot recording algorithms 87

4.1 Introduction 87

4.2 System model and definitions 90

4.3 Snapshot algorithms for FIFO channels 93

4.4 Variations of the Chandy–Lamport algorithm 97

4.5 Snapshot algorithms for non-FIFO channels 101

4.6 Snapshots in a causal delivery system 106

4.7 Monitoring global state 109

4.8 Necessary and sufficient conditions for consistent global

snapshots 110

4.9 Finding consistent global snapshots in a distributed

computation 114

4.10 Chapter summary 121

4.11 Exercises 122

4.12 Notes on references 122

References 123

5 Terminology and basic algorithms 126

5.1 Topology abstraction and overlays 126

5.2 Classifications and basic concepts 128

5.3 Complexity measures and metrics 135

5.4 Program structure 137

5.5 Elementary graph algorithms 138

5.6 Synchronizers 163

5.7 Maximal independent set (MIS) 169

5.8 Connected dominating set 171

5.9 Compact routing tables 172

5.10 Leader election 174

(11)

ix Contents

5.11 Challenges in designing distributed graph algorithms 175

5.12 Object replication problems 176

5.13 Chapter summary 182

5.14 Exercises 183

5.15 Notes on references 185

References 186

6 Message ordering and group communication 189

6.1 Message ordering paradigms 190

6.2 Asynchronous execution with synchronous communication 195 6.3 Synchronous program order on an asynchronous system 200

6.4 Group communication 205

6.5 Causal order (CO) 206

6.6 Total order 215

6.7 A nomenclature for multicast 220

6.8 Propagation trees for multicast 221

6.9 Classification of application-level multicast algorithms 225 6.10 Semantics of fault-tolerant group communication 228 6.11 Distributed multicast algorithms at the network layer 230

6.12 Chapter summary 236

6.13 Exercises 236

6.14 Notes on references 238

References 239

7 Termination detection 241

7.1 Introduction 241

7.2 System model of a distributed computation 242

7.3 Termination detection using distributed snapshots 243

7.4 Termination detection by weight throwing 245

7.5 A spanning-tree-based termination detection algorithm 247

7.6 Message-optimal termination detection 253

7.7 Termination detection in a very general distributed computing

model 257

7.8 Termination detection in the atomic computation model 263 7.9 Termination detection in a faulty distributed system 272

7.10 Chapter summary 279

7.11 Exercises 279

7.12 Notes on references 280

References 280

8 Reasoning with knowledge 282

8.1 The muddy children puzzle 282

8.2 Logic of knowledge 283

(12)

8.3 Knowledge in synchronous systems 289

8.4 Knowledge in asynchronous systems 290

8.5 Knowledge transfer 298

8.6 Knowledge and clocks 300

8.7 Chapter summary 301

8.8 Exercises 302

8.9 Notes on references 303

References 303

9 Distributed mutual exclusion algorithms 305

9.1 Introduction 305

9.2 Preliminaries 306

9.3 Lamport’s algorithm 309

9.4 Ricart–Agrawala algorithm 312

9.5 Singhal’s dynamic information-structure algorithm 315 9.6 Lodha and Kshemkalyani’s fair mutual exclusion algorithm 321

9.7 Quorum-based mutual exclusion algorithms 327

9.8 Maekawa’s algorithm 328

9.9 Agarwal–El Abbadi quorum-based algorithm 331

9.10 Token-based algorithms 336

9.11 Suzuki–Kasami’s broadcast algorithm 336

9.12 Raymond’s tree-based algorithm 339

9.13 Chapter summary 348

9.14 Exercises 348

9.15 Notes on references 349

References 350

10 Deadlock detection in distributed systems 352

10.1 Introduction 352

10.2 System model 352

10.3 Preliminaries 353

10.4 Models of deadlocks 355

10.5 Knapp’s classification of distributed deadlock detection

algorithms 358

10.6 Mitchell and Merritt’s algorithm for the single-

resource model 360

10.7 Chandy–Misra–Haas algorithm for the AND model 362 10.8 Chandy–Misra–Haas algorithm for the OR model 364 10.9 Kshemkalyani–Singhal algorithm for theP-out-of-Qmodel 365

10.10 Chapter summary 374

10.11 Exercises 375

10.12 Notes on references 375

References 376

(13)

xi Contents

11 Global predicate detection 379

11.1 Stable and unstable predicates 379

11.2 Modalities on predicates 382

11.3 Centralized algorithm for relational predicates 384

11.4 Conjunctive predicates 388

11.5 Distributed algorithms for conjunctive predicates 395

11.6 Further classification of predicates 404

11.7 Chapter summary 405

11.8 Exercises 406

11.9 Notes on references 407

References 408

12 Distributed shared memory 410

12.1 Abstraction and advantages 410

12.2 Memory consistency models 413

12.3 Shared memory mutual exclusion 427

12.4 Wait-freedom 434

12.5 Register hierarchy and wait-free simulations 434 12.6 Wait-free atomic snapshots of shared objects 447

12.7 Chapter summary 451

12.8 Exercises 452

12.9 Notes on references 453

References 454

13 Checkpointing and rollback recovery 456

13.1 Introduction 456

13.2 Background and definitions 457

13.3 Issues in failure recovery 462

13.4 Checkpoint-based recovery 464

13.5 Log-based rollback recovery 470

13.6 Koo–Toueg coordinated checkpointing algorithm 476 13.7 Juang–Venkatesan algorithm for asynchronous checkpointing

and recovery 478

13.8 Manivannan–Singhal quasi-synchronous checkpointing

algorithm 483

13.9 Peterson–Kearns algorithm based on vector time 492 13.10 Helary–Mostefaoui–Netzer–Raynal communication-induced

protocol 499

13.11 Chapter summary 505

13.12 Exercises 506

13.13 Notes on references 506

References 507

(14)

14 Consensus and agreement algorithms 510

14.1 Problem definition 510

14.2 Overview of results 514

14.3 Agreement in a failure-free system (synchronous or

asynchronous) 515

14.4 Agreement in (message-passing) synchronous systems with

failures 516

14.5 Agreement in asynchronous message-passing systems with

failures 529

14.6 Wait-free shared memory consensus in asynchronous systems 544

14.7 Chapter summary 562

14.8 Exercises 563

14.9 Notes on references 564

References 565

15 Failure detectors 567

15.1 Introduction 567

15.2 Unreliable failure detectors 568

15.3 The consensus problem 577

15.4 Atomic broadcast 583

15.5 A solution to atomic broadcast 584

15.6 The weakest failure detectors to solve fundamental agreement

problems 585

15.7 An implementation of a failure detector 589

15.8 An adaptive failure detection protocol 591

15.9 Exercises 596

15.10 Notes on references 596

References 596

16 Authentication in distributed systems 598

16.1 Introduction 598

16.2 Background and definitions 599

16.3 Protocols based on symmetric cryptosystems 602 16.4 Protocols based on asymmetric cryptosystems 615

16.5 Password-based authentication 622

16.6 Authentication protocol failures 625

16.7 Chapter summary 626

16.8 Exercises 627

16.9 Notes on references 627

References 628

17 Self-stabilization 631

17.1 Introduction 631

17.2 System model 632

(15)

xiii Contents

17.3 Definition of self-stabilization 634

17.4 Issues in the design of self-stabilization algorithms 636 17.5 Methodologies for designing self-stabilizing systems 647

17.6 Communication protocols 649

17.7 Self-stabilizing distributed spanning trees 650 17.8 Self-stabilizing algorithms for spanning-tree construction 652 17.9 An anonymous self-stabilizing algorithm for 1-maximal

independent set in trees 657

17.10 A probabilistic self-stabilizing leader election algorithm 660 17.11 The role of compilers in self-stabilization 662 17.12 Self-stabilization as a solution to fault tolerance 665

17.13 Factors preventing self-stabilization 667

17.14 Limitations of self-stabilization 668

17.15 Chapter summary 670

17.16 Exercises 670

17.17 Notes on references 671

References 671

18 Peer-to-peer computing and overlay graphs 677

18.1 Introduction 677

18.2 Data indexing and overlays 679

18.3 Unstructured overlays 681

18.4 Chord distributed hash table 688

18.5 Content addressible networks (CAN) 695

18.6 Tapestry 701

18.7 Some other challenges in P2P system design 708

18.8 Tradeoffs between table storage and route lengths 710

18.9 Graph structures of complex networks 712

18.10 Internet graphs 714

18.11 Generalized random graph networks 720

18.12 Small-world networks 720

18.13 Scale-free networks 721

18.14 Evolving networks 723

18.15 Chapter summary 727

18.16 Exercises 727

18.17 Notes on references 728

References 729

Index 731

(16)
(17)

Preface

Background

The field of distributed computing covers all aspects of computing and infor- mation access across multiple processing elements connected by any form of communication network, whether local or wide-area in the coverage. Since the advent of the Internet in the 1970s, there has been a steady growth of new applications requiring distributed processing. This has been enabled by advances in networking and hardware technology, the falling cost of hard- ware, and greater end-user awareness. These factors have contributed to making distributed computing a cost-effective, high-performance, and fault- tolerant reality. Around the turn of the millenium, there was an explosive growth in the expansion and efficiency of the Internet, which was matched by increased access to networked resources through the World Wide Web, all across the world. Coupled with an equally dramatic growth in the wireless and mobile networking areas, and the plummeting prices of bandwidth and storage devices, we are witnessing a rapid spurt in distributed applications and an accompanying interest in the field of distributed computing in universities, governments organizations, and private institutions.

Advances in hardware technology have suddenly made sensor networking a reality, and embedded and sensor networks are rapidly becoming an integral part of everyone’s life – from the home network with the interconnected gadgets to the automobile communicating by GPS (global positioning system), to the fully networked office with RFID monitoring. In the emerging global village, distributed computing will be the centerpiece of all computing and information access sub-disciplines within computer science. Clearly, this is a very important field. Moreover, this evolving field is characterized by a diverse range of challenges for which the solutions need to have foundations on solid principles.

The field of distributed computing is very important, and there is a huge demand for a good comprehensive book. This book comprehensively covers all important topics in great depth, combining this with a clarity of explanation

(18)

and ease of understanding. The book will be particularly valuable to the academic community and the computer industry at large. Writing such a comprehensive book has been a Herculean task and there is a deep sense of satisfaction in knowing that we were able complete it and perform this service to the community.

Description, approach, and features

The book will focus on the fundamental principles and models underlying all aspects of distributed computing. It will address the principles underlying the theory, algorithms, and systems aspects of distributed computing. The manner of presentation of the algorithms is very clear, explaining the main ideas and the intuition with figures and simple explanations rather than getting entangled in intimidating notations and lengthy and hard-to-follow rigorous proofs of the algorithms. The selection of chapter themes is broad and comprehensive, and the book covers all important topics in depth. The selection of algorithms within each chapter has been done carefully to elucidate new and important techniques of algorithm design. Although the book focuses on foundational aspects and algorithms for distributed computing, it thoroughly addresses all practical systems-like problems (e.g., mutual exclusion, deadlock detection, termination detection, failure recovery, authentication, global state and time, etc.) by presenting the theory behind and algorithms for such problems. The book is written keeping in mind the impact of emerging topics such as peer-to-peer computing andnetwork securityon the foundational aspects of distributed computing.

Each chapter contains figures, examples, exercises, a summary, and references.

Readership

This book is aimed as a textbook for the following:

• Graduate students and Senior level undergraduate students in computer science and computer engineering.

• Graduate students in electrical engineering and mathematics. As wireless networks, peer-to-peer networks, and mobile computing continue to grow in importance, an increasing number of students from electrical engineering departments will also find this book necessary.

• Practitioners, systems designers/programmers, and consultants in industry and research laboratories will find the book a very useful reference because it contains state-of-the-art algorithms and principles to address various design issues in distributed systems, as well as the latest references.

(19)

xvii Preface

Hard and soft prerequisites for the use of this book include the following:

• An undergraduate course in algorithms is required.

• Undergraduate courses in operating systems and computer networks would be useful.

• A reasonable familiarity with programming.

We have aimed for a very comprehensive book that will act as a single source for distributed computing models and algorithms. The book has both depth and breadth of coverage of topics, and is characterized by clear and easy explanations. None of the existing textbooks on distributed computing provides all of these features.

Acknowledgements

This book grew from the notes used in the graduate courses on distributed computing at the Ohio State University, the University of Illinois at Chicago, and at the University of Kentucky. We would like to thank the graduate students at these schools for their contributions to the book in many ways.

The book is based on the published research results of numerous researchers in the field. We have made all efforts to present the material in our own words and have given credit to the original sources of information. We would like to thank all the researchers whose work has been reported in this book.

Finally, we would like to thank the staff of Cambridge University Press for providing us with excellent support in the publication of this book.

Access to resources

The following websites will be maintained for the book. Any errors and comments should be sent to ajayk@cs.uic.edu or singhal@cs.uky.edu. Further information about the book can be obtained from the authors’ web pages:

• www.cs.uic.edu/∼ajayk/DCS-Book

• www.cs.uky.edu/∼singhal/DCS-Book.

(20)
(21)

C H A P T E R

1 Introduction

1.1 Definition

A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. Distributed systems have been in existence since the start of the universe. From a school of fish to a flock of birds and entire ecosystems of microorganisms, there is communication among mobile intelligent agents in nature. With the widespread proliferation of the Internet and the emerging global village, the notion of distributed computing systems as a useful and widely deployed tool is becoming a reality.

For computing systems, a distributed system has been characterized in one of several ways:

• You know you are using one when the crash of a computer you have never heard of prevents you from doing work [23].

• A collection of computers that do not share common memory or a common physical clock, that communicate by a messages passing over a communi- cation network, and where each computer has its own memory and runs its own operating system. Typically the computers are semi-autonomous and are loosely coupled while they cooperate to address a problem collectively [29].

• A collection of independent computers that appears to the users of the system as a single coherent computer [33].

• A term that describes a wide range of computers, from weakly coupled systems such as wide-area networks, to strongly coupled systems such as local area networks, to very strongly coupled systems such as multipro- cessor systems [19].

A distributed system can be characterized as a collection of mostly autonomous processors communicating over a communication network and having the following features:

No common physical clock This is an important assumption because it introduces the element of “distribution” in the system and gives rise to the inherent asynchrony amongst the processors.

1

(22)

No shared memory This is a key feature that requires message-passing for communication. This feature implies the absence of the common phys- ical clock.

It may be noted that a distributed system may still provide the abstraction of a common address space via the distributed shared memory abstraction.

Several aspects of shared memory multiprocessor systems have also been studied in the distributed computing literature.

Geographical separation The geographically wider apart that the pro- cessors are, the more representative is the system of a distributed system.

However, it is not necessary for the processors to be on a wide-area net- work (WAN). Recently, the network/cluster of workstations (NOW/COW) configuration connecting processors on a LAN is also being increasingly regarded as a small distributed system. This NOW configuration is becom- ing popular because of the low-cost high-speed off-the-shelf processors now available. The Google search engine is based on the NOW architec- ture.

Autonomy and heterogeneity The processors are “loosely coupled”

in that they have different speeds and each can be running a different operating system. They are usually not part of a dedicated system, but cooperate with one another by offering services or solving a problem jointly.

1.2 Relation to computer system components

A typical distributed system is shown in Figure 1.1. Each computer has a memory-processing unit and the computers are connected by a communication network. Figure 1.2shows the relationships of the software components that run on each of the computers and use the local operating system and network protocol stack for functioning. The distributed software is also termed as middleware. Adistributed executionis the execution of processes across the distributed system to collaboratively achieve a common goal. An execution is also sometimes termed acomputationor arun.

The distributed system uses a layered architecture to break down the com- plexity of system design. The middleware is the distributed software that

Figure 1.1 A distributed system connects processors by a communication network.

P M

Communication network (WAN/ LAN)

P processor(s)

M memory bank(s)

P M P M

P M

P M P M

P M

(23)

3 1.3 Motivation

Figure 1.2Interaction of the software components at each processor.

Extent of distributed

protocols Distributed application

Network layer Application layer

Data link layer Transport layer

Network protocol stack

Distributed software (middleware libraries)

Operating system

drives the distributed system, while providing transparency of heterogeneity at the platform level [24]. Figure1.2schematically shows the interaction of this software with these system components at each processor. Here we assume that the middleware layer does not contain the traditional application layer functions of the network protocol stack, such as http, mail,ftp, and telnet.

Various primitives and calls to functions defined in various libraries of the middleware layer are embedded in the user program code. There exist several libraries to choose from to invoke primitives for the more common func- tions – such as reliable and ordered multicasting – of the middleware layer.

There are several standards such as Object Management Group’s (OMG) common object request broker architecture (CORBA) [36], and the remote procedure call (RPC) mechanism [1,11]. The RPC mechanism conceptually works like a local procedure call, with the difference that the procedure code may reside on a remote machine, and the RPC software sends a message across the network to invoke the remote procedure. It then awaits a reply, after which the procedure call completes from the perspective of the program that invoked it. Currently deployed commercial versions of middleware often use CORBA, DCOM (distributed component object model), Java, and RMI (remote method invocation) [7] technologies. The message-passing interface (MPI) [20,30] developed in the research community is an example of an interface for various communication functions.

1.3 Motivation

The motivation for using a distributed system is some or all of the following requirements:

1. Inherently distributed computations In many applications such as money transfer in banking, or reaching consensus among parties that are geographically distant, the computation is inherently distributed.

2. Resource sharing Resources such as peripherals, complete data sets in databases, special libraries, as well as data (variable/files) cannot be

(24)

fully replicated at all the sites because it is often neither practical nor cost-effective. Further, they cannot be placed at a single site because access to that site might prove to be a bottleneck. Therefore, such resources are typically distributed across the system. For example, distributed databases such as DB2 partition the data sets across several servers, in addition to replicating them at a few sites for rapid access as well as reliability.

3. Access to geographically remote data and resources In many sce- narios, the data cannot be replicated at every site participating in the distributed execution because it may be too large or too sensitive to be replicated. For example, payroll data within a multinational corporation is both too large and too sensitive to be replicated at every branch office/site.

It is therefore stored at a central server which can be queried by branch offices. Similarly, special resources such as supercomputers exist only in certain locations, and to access such supercomputers, users need to log in remotely.

Advances in the design of resource-constrained mobile devices as well as in the wireless technology with which these devices communicate have given further impetus to the importance of distributed protocols and middleware.

4. Enhanced reliability A distributed system has the inherent potential to provide increased reliability because of the possibility of replicating resources and executions, as well as the reality that geographically dis- tributed resources are not likely to crash/malfunction at the same time under normal circumstances. Reliability entails several aspects:

• availability, i.e., the resource should be accessible at all times;

• integrity, i.e., the value/state of the resource should be correct, in the face of concurrent access from multiple processors, as per the semantics expected by the application;

• fault-tolerance, i.e., the ability to recover from system failures, where such failures may be defined to occur in one of many failure models, which we will study in Chapters5and14.

5. Increased performance/cost ratio By resource sharing and accessing geographically remote data and resources, the performance/cost ratio is increased. Although higher throughput has not necessarily been the main objective behind using a distributed system, nevertheless, any task can be partitioned across the various computers in the distributed system. Such a configuration provides a better performance/cost ratio than using special parallel machines. This is particularly true of the NOW configuration.

In addition to meeting the above requirements, a distributed system also offers the following advantages:

6. Scalability As the processors are usually connected by a wide-area net- work, adding more processors does not pose a direct bottleneck for the communication network.

(25)

5 1.4 Relation to parallel multiprocessor/multicomputer systems

7. Modularity and incremental expandability Heterogeneous processors may be easily added into the system without affecting the performance, as long as those processors are running the same middleware algo- rithms. Similarly, existing processors may be easily replaced by other processors.

1.4 Relation to parallel multiprocessor/multicomputer systems

The characteristics of a distributed system were identified above. A typical distributed system would look as shown in Figure 1.1. However, how does one classify a system that meets some but not all of the characteristics? Is the system still a distributed system, or does it become a parallel multiprocessor system? To better answer these questions, we first examine the architec- ture of parallel systems, and then examine some well-known taxonomies for multiprocessor/multicomputer systems.

1.4.1 Characteristics of parallel systems

A parallel system may be broadly classified as belonging to one of three types:

1. Amultiprocessor systemis a parallel system in which the multiple proces- sors havedirect access to shared memorywhich forms a common address space. The architecture is shown in Figure1.3(a). Such processors usually do not have a common clock.

A multiprocessor system usually corresponds to a uniform memory access (UMA) architecture in which the access latency, i.e., waiting time, to complete an access to any memory location from any processor is the same.

The processors are in very close physical proximity and are connected by an interconnection network. Interprocess communication across processors is traditionally through read and write operations on the shared memory, although the use of message-passing primitives such as those provided by

Figure 1.3 Two standard architectures for parallel systems. (a) Uniform memory access (UMA) multiprocessor system. (b) Non-uniform memory access (NUMA) multiprocessor. In both architectures, the processors may locally cache data from

memory. M memory P processor

(b) (a)

Interconnection network Interconnection network

P P

P P

M M M

M P M

P M P M P M

P M P M

(26)

Figure 1.4 Interconnection networks for shared memory multiprocessor systems. (a) Omega network [4] forn=8 processorsP0–P7 and memory banksM0–M7. (b) Butterfly network [10] for n=8 processorsP0–P7 and memory banksM0–M7.

P0 P1 P2 P3 P4

P6 P7

101 P5

000 001

M0 M1 010 011 100 101 110 111 001

101 110 111 100

111 110 100 011 010 000

010 M2 000 001

100 101 P0 P1 P2 P3

P4 P5 P6 P7

(a) 3-stage Omega network (n=8, M=4) (b) 3-stage Butterfly network (n=8, M=4)

011 M3

M4 M5 M6 M7 000

001 010 011

M0 M1 M2 M3 M4 M5 M6 M7 110

111

the MPI, is also possible (using emulation on the shared memory). All the processors usually run the same operating system, and both the hardware and software are very tightly coupled.

The processors are usually of the same type, and are housed within the same box/container with a shared memory. The interconnection network to access the memory may be a bus, although for greater efficiency, it is usually amultistage switchwith a symmetric and regular design.

Figure 1.4shows two popular interconnection networks – the Omega network [4] and the Butterfly network [10], each of which is a multi-stage network formed of 2×2 switching elements. Each 2×2 switch allows data on either of the two input wires to be switched to the upper or the lower output wire. In a single step, however, only one data unit can be sent on an output wire. So if the data from both the input wires is to be routed to the same output wire in a single step, there is a collision. Various techniques such as buffering or more elaborate interconnection designs can address collisions.

Each 2×2 switch is represented as a rectangle in the figure. Further- more, a n-input and n-output network uses log n stages and log n bits for addressing. Routing in the 2×2 switch at stage k uses only thekth bit, and hence can be done at clock speed in hardware. The multi-stage networks can be constructed recursively, and the interconnection pattern between any two stages can be expressed using an iterative or a recursive generating function. Besides the Omega and Butterfly (banyan) networks, other examples of multistage interconnection networks are the Clos [9]

and the shuffle-exchange networks [37]. Each of these has very interesting mathematical properties that allow rich connectivity between the processor bank and memory bank.

Omega interconnection function The Omega network which connects nprocessors tonmemory units hasn/2log2nswitching elements of size 2×2 arranged inlog2n stages. Between each pair of adjacent stages of the Omega network, a link exists between outputiof a stage and the input jto the next stage according to the followingperfect shufflepattern which

(27)

7 1.4 Relation to parallel multiprocessor/multicomputer systems

is a left-rotation operation on the binary representation ofi to getj. The iterative generation function is as follows:

j=

2i for 0≤i≤n/2−1

2i+1−n forn/2≤i≤n−1 (1.1) Consider any stage of switches. Informally, the upper (lower) input lines for each switch come in sequential order from the upper (lower) half of the switches in the earlier stage.

With respect to the Omega network in Figure1.4(a),n=8. Hence, for any stage, for the outputs i, where 0≤i≤3, the output i is connected to input 2i of the next stage. For 4≤i≤7, the outputi of any stage is connected to input 2i+1−nof the next stage.

Omega routing function The routing function from input lineito output linej considers only jand the stage number s, wheres∈0 log2n−1.

In a stagesswitch, if thes+1th MSB (most significant bit) ofjis 0, the data is routed to the upper output wire, otherwise it is routed to the lower output wire.

Butterfly interconnection function Unlike the Omega network, the gen- eration of the interconnection pattern between a pair of adjacent stages depends not only onnbut also on the stage numbers. The recursive expres- sion is as follows. Let there beM=n/2 switches per stage, and let a switch be denoted by the tuplex s, wherex∈0 M−1and stages∈0 log2n−1.

The two outgoing edges from any switchx sare as follows. There is an edge from switchx sto switchy s+1if (i) x=yor (ii)xXOR yhas exactly one 1 bit, which is in thes+1th MSB. For stages, apply the rule above forM/2s switches.

Whether the two incoming connections go to the upper or the lower input port is not important because of the routing function, given below.

Example Consider the Butterfly network in Figure 1.4(b), n=8 and M=4. There are three stages,s=012, and the interconnection pattern is defined between s=0 and s=1 and between s=1 and s=2. The switch numberxvaries from 0 to 3 in each stage, i.e.,xis a 2-bit string.

(Note that unlike the Omega network formulation using input and output lines given above, this formulation uses switch numbers. Exercise1.5asks you to prove a formulation of the Omega interconnection pattern using switch numbers instead of input and output port numbers.)

Consider the first stage interconnection (s=0) of a butterfly of sizeM, and hence havinglog22M stages. For stages=0, as per rule (i), the first output line from switch 00 goes to the input line of switch 00 of stage s=1. As per rule (ii), the second output line of switch 00 goes to input line of switch 10 of stages=1. Similarly,x= 01 has one output line go to an input line of switch 11 in stages=1. The other connections in this stage

(28)

can be determined similarly. For stages=1 connecting to stages=2, we apply the rules considering onlyM/21=M/2 switches, i.e., we build two butterflies of sizeM/2 – the “upper half” and the “lower half” switches.

The recursion terminates forM/2s=1, when there is a single switch.

Butterfly routing function In a stagesswitch, if thes+1th MSB ofj is 0, the data is routed to the upper output wire, otherwise it is routed to the lower output wire.

Observe that for the Butterfly and the Omega networks, the paths from the different inputs to any one output form a spanning tree. This implies that collisions will occur when data is destined to the same output line.

However, the advantage is that data can be combined at the switches if the application semantics (e.g., summation of numbers) are known.

2. Amulticomputer parallel systemis a parallel system in which the multiple processors do not have direct access to shared memory.The memory of the multiple processors may or may not form a common address space.

Such computers usually do not have a common clock. The architecture is shown in Figure1.3(b).

The processors are in close physical proximity and are usually very tightly coupled (homogenous hardware and software), and connected by an interconnection network. The processors communicate either via a com- mon address space or via message-passing. A multicomputer system that has a common address spaceusuallycorresponds to a non-uniform mem- ory access (NUMA) architecture in which the latency to access various shared memory locations from the different processors varies.

Examples of parallel multicomputers are: the NYU Ultracomputer and the Sequent shared memory machines, the CM* Connection machine and processors configured in regular and symmetrical topologies such as an array or mesh, ring, torus, cube, and hypercube (message-passing machines). The regular and symmetrical topologies have interesting math- ematical properties that enable very easy routing and provide many rich features such as alternate routing.

Figure1.5(a) shows a wrap-around 4×4 mesh. For ak×kmesh which will contain k2 processors, the maximum path length between any two processors is 2k/2−1. Routing can be done along the Manhattan grid.

Figure1.5(b) shows a four-dimensional hypercube. Ak-dimensional hyper- cube has 2kprocessor-and-memory units [13,21]. Each such unit is a node in the hypercube, and has a uniquek-bit label. Each of thekdimensions is associated with a bit position in the label. The labels of any two adjacent nodes are identical except for the bit position corresponding to the dimen- sion in which the two nodes differ. Thus, the processors are labelled such that the shortest path between any two processors is theHamming distance (defined as the number of bit positions in which the two equal sized bit strings differ) between the processor labels. This is clearly bounded byk.

(29)

9 1.4 Relation to parallel multiprocessor/multicomputer systems

Figure 1.5 Some popular topologies for multicomputer shared-memory machines. (a) Wrap-around 2D-mesh, also known as torus. (b) Hypercube of dimension 4.

0010

0011 0101

(b) (a)

processor+memory

1100 1000

1110 1010

1111 1001 1011

0001 0100 0110 0000

0111 1101

Example Nodes 0101 and 1100 have a Hamming distance of 2. The shortest path between them has length 2.

Routing in the hypercube is done hop-by-hop. At any hop, the message can be sent along any dimension corresponding to the bit position in which the current node’s address and the destination address differ. The 4D hypercube shown in the figure is formed by connecting the corresponding edges of two 3D hypercubes (corresponding to the left and right “cubes”

in the figure) along the fourth dimension; the labels of the 4D hypercube are formed by prepending a “0” to the labels of the left 3D hypercube and prepending a “1” to the labels of the right 3D hypercube. This can be extended to construct hypercubes of higher dimensions. Observe that there are multiple routes between any pair of nodes, which provides fault- tolerance as well as a congestion control mechanism. The hypercube and its variant topologies have very interesting mathematical properties with implications for routing and fault-tolerance.

3. Array processorsbelong to a class of parallel computers that are physically co-located, are very tightly coupled, and have a common system clock (but may not share memory and communicate by passing data using messages).

Array processors and systolic arrays that perform tightly synchronized processing and data exchange in lock-step for applications such as DSP and image processing belong to this category. These applications usually involve a large number of iterations on the data. This class of parallel systems has a very niche market.

The distinction between UMA multiprocessors on the one hand, and NUMA and message-passing multicomputers on the other, is important because the algorithm design and data and task partitioning among the processors must account for the variable and unpredictable latencies in accessing mem- ory/communication [22]. As compared to UMA systems and array processors, NUMA and message-passing multicomputer systems are less suitable when the degree of granularity of accessing shared data and communication is very fine.

The primary and most efficacious use of parallel systems is for obtain- ing a higher throughput by dividing the computational workload among the

(30)

processors. The tasks that are most amenable to higher speedups on par- allel systems are those that can be partitioned into subtasks very nicely, involving much number-crunching and relatively little communication for synchronization. Once the task has been decomposed, the processors perform large vector, array, and matrix computations that are common in scientific applications. Searching through large state spaces can be performed with sig- nificant speedup on parallel machines. While such parallel machines were an object of much theoretical and systems research in the 1980s and early 1990s, they have not proved to be economically viable for two related reasons.

First, the overall market for the applications that can potentially attain high speedups is relatively small. Second, due to economy of scale and the high processing power offered by relatively inexpensive off-the-shelf networked PCs, specialized parallel machines are not cost-effective to manufacture. They additionally require special compiler and other system support for maximum throughput.

1.4.2 Flynn’s taxonomy

Flynn [14] identified four processing modes, based on whether the processors execute the same or different instruction streams at the same time, and whether or not the processors processed the same (identical) data at the same time. It is instructive to examine this classification to understand the range of options used for configuring systems:

Single instruction stream, single data stream (SISD)

This mode corresponds to the conventional processing in the von Neumann paradigm with a single CPU, and a single memory unit connected by a system bus.

Single instruction stream, multiple data stream (SIMD)

This mode corresponds to the processing by multiple homogenous proces- sors which execute in lock-step on different data items. Applications that involve operations on large arrays and matrices, such as scientific applica- tions, can best exploit systems that provide the SIMD mode of operation because the data sets can be partitioned easily.

Several of the earliest parallel computers, such as Illiac-IV, MPP, CM2, and MasPar MP-1 were SIMD machines. Vector processors, array pro- cessors’ and systolic arrays also belong to the SIMD class of processing.

Recent SIMD architectures include co-processing units such as the MMX units in Intel processors (e.g., Pentium with the streaming SIMD extensions (SSE) options) and DSP chips such as the Sharc [22].

Multiple instruction stream, single data stream (MISD)

This mode corresponds to the execution of different operations in parallel on the same data. This is a specialized mode of operation with limited but niche applications, e.g., visualization.

(31)

11 1.4 Relation to parallel multiprocessor/multicomputer systems

Figure 1.6Flynn’s taxonomy of SIMD, MIMD, and MISD architectures for multiprocessor/multicomputer systems.

P P

C C

C C

I I

I I I

I I

I I

D P

(c) MISD (b) MIMD

(a) SIMD

processing unit control unit P

D data stream I instruction stream

C C

P D

P P

D

D D

Multiple instruction stream, multiple data stream (MIMD)

In this mode, the various processors execute different code on different data. This is the mode of operation in distributed systems as well as in the vast majority of parallel systems. There is no common clock among the system processors. Sun Ultra servers, multicomputer PCs, and IBM SP machines are examples of machines that execute in MIMD mode.

SIMD, MISD, and MIMD architectures are illustrated in Figure 1.6. MIMD architectures are most general and allow much flexibility in partitioning code and data to be processed among the processors. MIMD architectures also include the classically understood mode of execution in distributed systems.

1.4.3 Coupling, parallelism, concurrency, and granularity

Coupling

The degree of coupling among a set of modules, whether hardware or software, is measured in terms of the interdependency and binding and/or homogeneity among the modules. When the degree of coupling is high (low), the mod- ules are said to be tightly (loosely) coupled. SIMD and MISD architectures generally tend to be tightly coupled because of the common clocking of the shared instruction stream or the shared data stream. Here we briefly examine various MIMD architectures in terms of coupling:

• Tightly coupled multiprocessors (with UMA shared memory). These may be either switch-based (e.g., NYU Ultracomputer, RP3) or bus-based (e.g., Sequent, Encore).

• Tightly coupled multiprocessors (with NUMA shared memory or that communicate by message passing). Examples are the SGI Origin 2000 and the Sun Ultra HPC servers (that communicate via NUMA shared memory), and the hypercube and the torus (that communicate by message passing).

• Loosely coupled multicomputers (without shared memory) physically co- located. These may be bus-based (e.g., NOW connected by a LAN or Myrinet card) or using a more general communication network, and the processors may be heterogenous. In such systems, processors neither share

(32)

memory nor have a common clock, and hence may be classified as dis- tributed systems – however, the processors are very close to one another, which is characteristic of a parallel system. As the communication latency may be significantly lower than in wide-area distributed systems, the solu- tion approaches to various problems may be different for such systems than for wide-area distributed systems.

• Loosely coupled multicomputers (without shared memory and without common clock) that are physically remote. These correspond to the con- ventional notion of distributed systems.

Parallelism or speedup of a program on a specific system

This is a measure of the relative speedup of a specific program, on a given machine. The speedup depends on the number of processors and the mapping of the code to the processors. It is expressed as the ratio of the timeT1with a single processor, to the timeTnwithnprocessors.

Parallelism within a parallel/distributed program

This is an aggregate measure of the percentage of time that all the proces- sors are executing CPU instructions productively, as opposed to waiting for communication (either via shared memory or message-passing) operations to complete. The term is traditionally used to characterize parallel programs. If the aggregate measure is a function of only the code, then the parallelism is independent of the architecture. Otherwise, this definition degenerates to the definition of parallelism in the previous section.

Concurrency of a program

This is a broader term that means roughly the same as parallelism of a program, but is used in the context of distributed programs. The paral- lelism/concurrencyin a parallel/distributed program can be measured by the ratio of the number of local (non-communication and non-shared memory access) operations to the total number of operations, including the communi- cation or shared memory access operations.

Granularity of a program

The ratio of the amount of computation to the amount of communication within the parallel/distributed program is termed asgranularity. If the degree of parallelism is coarse-grained (fine-grained), there are relatively many more (fewer) productive CPU instruction executions, compared to the number of times the processors communicate either via shared memory or message- passing and wait to get synchronized with the other processors. Programs with fine-grained parallelism are best suited for tightly coupled systems. These typically include SIMD and MISD architectures, tightly coupled MIMD multiprocessors (that have shared memory), and loosely coupled multicom- puters (without shared memory) that are physically colocated. If programs with fine-grained parallelism were run over loosely coupled multiprocessors

(33)

13 1.5 Message-passing systems versus shared memory systems

that are physically remote, the latency delays for the frequent communication over the WAN would significantly degrade the overall throughput. As a corollary, it follows that on such loosely coupled multicomputers, programs with a coarse-grained communication/message-passing granularity will incur substantially less overhead.

Figure 1.2 showed the relationships between the local operating system, the middleware implementing the distributed software, and the network pro- tocol stack. Before moving on, we identify various classes of multiproces- sor/multicomputer operating systems:

• The operating system running on loosely coupled processors (i.e., het- erogenous and/or geographically distant processors), which are themselves running loosely coupled software (i.e., software that is heterogenous), is classified as anetwork operating system. In this case, the application can- not run any significant distributed function that is not provided by the application layer of the network protocol stacks on the various processors.

• The operating system running on loosely coupled processors, which are running tightly coupled software (i.e., the middleware software on the processors is homogenous), is classified as adistributed operating system.

• The operating system running on tightly coupled processors, which are themselves running tightly coupled software, is classified as a multipro- cessor operating system. Such a parallel system can run sophisticated algorithms contained in the tightly coupled software.

1.5 Message-passing systems versus shared memory systems

Shared memory systems are those in which there is a (common) shared address space throughout the system. Communication among processors takes place via shared data variables, and control variables for synchronization among the processors. Semaphores and monitors that were originally designed for shared memory uniprocessors and multiprocessors are examples of how syn- chronization can be achieved in shared memory systems. All multicomputer (NUMA as well as message-passing) systems that do not have a shared address space provided by the underlying architecture and hardware necessarily com- municate by message passing. Conceptually, programmers find it easier to program using shared memory than by message passing. For this and several other reasons that we examine later, the abstraction called shared memory is sometimes provided to simulate a shared address space. For a distributed system, this abstraction is called distributed shared memory. Implementing this abstraction has a certain cost but it simplifies the task of the application programmer. There also exists a well-known folklore result that communi- cation via message-passing can be simulated by communication via shared memory and vice-versa. Therefore, the two paradigms are equivalent.

(34)

1.5.1 Emulating message-passing on a shared memory system (MP → SM)

The shared address space can be partitioned into disjoint parts, one part being assigned to each processor. “Send” and “receive” operations can be implemented by writing to and reading from the destination/sender processor’s address space, respectively. Specifically, a separate location can be reserved as the mailbox for each ordered pair of processes. APi–Pj message-passing can be emulated by a write byPi to the mailbox and then a read byPj from the mailbox. In the simplest case, these mailboxes can be assumed to have unbounded size. The write and read operations need to be controlled using synchronization primitives to inform the receiver/sender after the data has been sent/received.

1.5.2 Emulating shared memory on a message-passing system (SM → MP)

This involves the use of “send” and “receive” operations for “write” and

“read” operations. Each shared location can be modeled as a separate process;

“write” to a shared location is emulated by sending an update message to the corresponding owner process; a “read” to a shared location is emulated by sending a query message to the owner process. As accessing another processor’s memory requires send and receive operations, this emulation is expensive. Although emulating shared memory might seem to be more attractive from a programmer’s perspective, it must be remembered that in a distributed system, it is only an abstraction. Thus, the latencies involved in read and write operations may be high even when using shared memory emulation because the read and write operations are implemented by using network-wide communication under the covers.

An application can of course use a combination of shared memory and message-passing. In a MIMD message-passing multicomputer system, each

“processor” may be a tightly coupled multiprocessor system with shared memory. Within the multiprocessor system, the processors communicate via shared memory. Between two computers, the communication is by message passing. As message-passing systems are more common and more suited for wide-area distributed systems, we will consider message-passing systems more extensively than we consider shared memory systems.

1.6 Primitives for distributed communication

1.6.1 Blocking/non-blocking, synchronous/asynchronous primitives

Message send and message receive communication primitives are denoted Send()andReceive(), respectively. ASendprimitive has at least two param- eters – the destination, and the buffer in the user space, containing the data to be sent. Similarly, a Receive primitive has at least two parameters – the

(35)

15 1.6 Primitives for distributed communication

source from which the data is to be received (this could be a wildcard), and the user buffer into which the data is to be received.

There are two ways of sending data when theSendprimitive is invoked – the buffered option and the unbuffered option. Thebuffered optionwhich is the standard option copies the data from the user buffer to the kernel buffer.

The data later gets copied from the kernel buffer onto the network. In the unbuffered option, the data gets copied directly from the user buffer onto the network. For the Receive primitive, the buffered option is usually required because the data may already have arrived when the primitive is invoked, and needs a storage place in the kernel.

The following are some definitions of blocking/non-blocking and syn- chronous/asynchronous primitives [12]:

Synchronous primitives ASendor aReceiveprimitive issynchronous if both theSend()andReceive()handshake with each other. The processing for theSendprimitive completes only after the invoking processor learns that the other correspondingReceiveprimitive has also been invoked and that the receive operation has been completed. The processing for the Receive primitive completes when the data to be received is copied into the receiver’s user buffer.

Asynchronous primitives ASendprimitive is said to beasynchronous if control returns back to the invoking process after the data item to be sent has been copied out of the user-specified buffer.

It does not make sense to define asynchronousReceiveprimitives.

Blocking primitives A primitive is blocking if control returns to the invoking process after the processing for the primitive (whether in syn- chronous or asynchronous mode) completes.

Non-blocking primitives A primitive isnon-blockingif control returns back to the invoking process immediately after invocation, even though the operation has not completed. For a non-blockingSend, control returns to the process even before the data is copied out of the user buffer. For a non-blockingReceive, control returns to the process even before the data may have arrived from the sender.

For non-blocking primitives, a return parameter on the primitive call returns a system-generated handlewhich can be later used to check the status of completion of the call. The process can check for the completion of the call in two ways. First, it can keep checking (in a loop or periodically) if the handle has been flagged orposted. Second, it can issue aWaitwith a list of handles as parameters. TheWait call usually blocks until one of the parameter handles is posted. Presumably after issuing the primitive in non-blocking mode, the process has done whatever actions it could and now needs to know the status of completion of the call, therefore using a blockingWait()call is usual programming practice. The code for a non-blockingSendwould look as shown in Figure1.7.

(36)

Figure 1.7 A non-blocking sendprimitive. When theWait call returns, at least one of its parameters is posted.

Send(X, destination, handlek) // handlekis a return parameter

Wait(handle1, handle2, …, handlek, …, handlem) // Wait always blocks

If at the time that Wait() is issued, the processing for the primi- tive (whether synchronous or asynchronous) has completed, the Wait returns immediately. The completion of the processing of the primitive is detectable by checking the value of handlek. If the processing of the primitive has not completed, theWaitblocks and waits for a signal to wake it up. When the processing for the primitive completes, the communication subsystem software sets the value ofhandlekand wakes up (signals) any process with aWait call blocked on thishandlek. This is called posting the completion of the operation.

There are therefore four versions of theSendprimitive – synchronous block- ing, synchronous non-blocking, asynchronous blocking, and asynchronous non-blocking. For the Receiveprimitive, there are the blocking synchronous and non-blocking synchronous versions. These versions of the primitives are illustrated in Figure 1.8using a timing diagram. Here, three time lines are

Figure 1.8 Blocking/

non-blocking and synchronous/asynchronous primitives [12]. ProcessPiis sending and processPjis receiving. (a) Blocking synchronousSendand blocking (synchronous) Receive. (b) Non-blocking synchronousSendand nonblocking (synchronous) Receive. (c) Blocking asynchronousSend. (d) Non-blocking asynchronous Send.

S_C P,

(a) Blocking sync. Send, blocking Receive (b) Nonblocking sync. Send, nonblocking Receive P, R_C

S_C P,

R_C

R R

P The completion of the previously initiated nonblocking operation (c) Blocking async. Send (d) Non-blocking async. Send

R_C

R Receive primitive issued processing for Receive completes kernel_i

buffer_i process i

W W

W W

W Process may issue Wait to check completion of nonblocking operation S_C

Send primitive issued

S processing for Send completes

S S_C

process i buffer_i kernel_i

process j buffer_ j kernel_ j

S

W W

S S_C S

Duration to copy data from or to user buffer

Duration in which the process issuing send or receive primitive is blocked

Références

Documents relatifs

Composition in the case of reductions to decision problems is also easier, because also in this case the length of each query made by the reduction is upper-bounded by an exponential

Starting from the definition of Turing machines and the basic notions of computability theory, it covers the basic time andspace complexity classes andalso includes a few more

This second procedure for beam fabrication is based on the localised oxide isolation of silicon (LOCOS) process in which windows are opened in silicon nitride on the silicon

The Bayesian approach allows us to directly compute the probability of any particular theory or particular value of a model parameter, issues that the conventional statistical

The second two are complex numbers; they are each expressions of the form a bÇ, where Ç is the imaginary number whose square is 1.. They are purely algebraic

Thus, we include narratives in which travelling facts have been strongly disputed by others in the community, as in the climate science case (Oreskes, this volume), or where

6.2 Embodied agent features that maintain user attention The authors have identified four important embodied agent features that are integral to maintaining users’ attention:

The inherent optical properties of the medium – the absorption and scattering coefficients, the volume scattering function – are not only (in accordance with their definitions)