MESSAGE PASSING VERSUS SHARED MEMORY ARCHITECTURES

Message Passing Architecture

5.7 MESSAGE PASSING VERSUS SHARED MEMORY ARCHITECTURES

As indicated in Chapter 4, shared memory enjoys the desirable feature that all communications are done using implicit loads and stores to a global address space. Another fundamental feature of shared memory is that synchronization and communication are distinct. Special synchronization operations (mechanisms), in addition to the loads and stores operations, need to be employed in order to detect when data have been produced and/or consumed. On the other hand, message pas-sing employs an explicit communication model. Explicit messages are exchanged among processors. Synchronization and communication are uniﬁed in message pas-sing. The generation of remote, asynchronous events is an integral part of the mess-age passing communication model. It is important, however, to indicate that shared memory and message passing communication models are universal; that is, it is possible to employ one to simulate the other. However, it is observed that it is easier to simulate shared memory using message passing than the converse. This is basically because of the asynchronous event semantics of message passing as compared to the polling semantics of the shared memory.

A number of desirable features characterize shared memory architectures (see Chapter 4). The shared memory communication model allows the programmer to con-centrate on the issues related to parallelism by relieving him/her of the details of the interprocessor communication. In that sense, the shared memory communication model represents a straightforward extension of the uniprocessor programming para-digm. In addition, shared memory semantics are independent of the physical location and therefore they are open to the dynamic optimization offered by the underlying operating system. On the other hand, the shared memory communication model is in essence a polling interface. This is a drawback as far as synchronization is concerned.

This fact has been recognized by a number of multiprocessor architects and their response has always been to augment the basic shared memory communication model with additional synchronization mechanisms. An additional drawback of shared memory is that in order for data to cross the network, a complete round trip has to be made. One-way communication of data is not possible.

Message passing can be characterized as employing an interrupt-driven com-munication model. In message passing, messages include both data and synchroni-zation in a single unit. As such, the message passing communication model lends itself to those operating system activities in which communication patterns are explicitly known in advance, for example, I/O, interprocessor interrupts, and task and data migration. The message passing communication model lends itself also to applications that have large synchronization components, for example, solution of systems of sparse matrices and event-driven simulation. In addition, message passing communication models are natural client – server style decomposition. On the other hand, message passing suffers from the need for marshaling cost, that is, the cost of assembling and disassembling of the message.

One natural conclusion arising from the above discussion is that shared memory and message passing communication models each lend themselves naturally to certain application domains. Shared memory manifests itself to application writers while message passing manifests itself to operating systems designers. It is therefore natural to consider combining both shared memory and message passing in general-purpose multiprocessor systems. This has been the main driving force behind systems such as the Stanford FLexible Architecture for SHared memory (FLASH) system (see Problems). It is a multiprocessor system that efﬁciently integrates sup-port for shared memory and message passing while minimizing both hardware and software overhead.

5.8 CHAPTER SUMMARY

Shared memory systems may be easier to program, but are difﬁcult to scale up to a large number of processors. If scalability to larger and larger systems (as measured by the number of processing units) was to continue, systems had to use message pas-sing techniques. It is apparent that message paspas-sing systems are the only way to efﬁ-ciently increase the number of processors managed by a multiprocessor system.

There are, however, a number of problems associated with message passing systems.

These include communication overhead and difﬁculty of programming. In this chap-ter, we discussed the architecture and the network models of message passing systems. We shed some light on routing and network switching techniques. We con-cluded with a contrast between shared memory and message passing systems.

PROBLEMS

1. Contemplate the advantages and disadvantages of message passing architec-tures and compare them with those found in shared memory architecarchitec-tures.

2. Based on your ﬁnding in Problem 1, you may conclude that an architecture that combines the best of two worlds should be preferred over either of the two systems. Discuss the advantages and disadvantages of a combined shared memory message passing architecture.

3. In connection with Problem 2 above, an architecture that combines shared memory and message passing has been introduced by Stanford University. It is called the FLASH system. Write a complete report on that architecture, dis-cussing its hardware and software features as well as its programming model.

Support your report with illustrations, tables, and examples, whenever possible.

4. Discuss the conditions that lead to the occurrence of the deadlock problems in multicomputer message passing systems. Suggest ways to avoid the occurrence of such a problem. Provide some examples to support your suggestions.

5. Repeat Problem 4 considering the livelock problem instead.

6. Repeat Problem 4 considering the starvation problem instead.

7. Show how to perform the matrix-vector multiplication problem using collec-tive communications in message passing systems. Compare the time complex-ity and the speedup resulting from using a multicomputer message passing as compared to using a single processor. Provide an illustrative example.

8. Repeat Problem 7 above for the problem of ﬁnding the min (A(1),A(2),. . ., A(n)) in ann-element vector A.

9. Repeat Problem 7 considering execution of the following simple loop on a single processor compared withk,nprocessors in a message passing arrangement.

for i¼1,n C½i ¼A½i þB½i,

10. Design a message passing routing algorithm for ann-dimensional hypercube network that broadcasts a host message to all nodes in the hypercube at the greatest speed. Show how this algorithm can be implemented with message passing routines.

11. Repeat Problem 10 for the case where a node in then-dimensional hypercube can broadcast a message to all other nodes. Show how this algorithm can be implemented with message passing routines.

12. Repeat Problem 10 for the case of ann-dimensional mesh network.

13. Repeat Problem 11 for the case of ann-dimensional mesh network.

REFERENCES

Al-Tawil, K., Abd-El-Barr, M. and Ashraf, F. A survey and comparison of wormhole routing techniques in mesh networks.IEEE Network, 38 – 45 (1997).

Almasi/Gottlieb.Highly Parallel Computing, 1989.

Ashraf, F.Routing in Multicomputer Networks: A Classiﬁcation and Comparison, MS thesis, Department of Information and Computer Science, College of Computer Science and Engineering, King Fahd University of Petroleum and Minerals (KFUPM), June 1996.

Dally, W. and Seitz, C. Deadlock-free message routing in multiprocessor interconnection networks.IEEE Transactions on Computers, C-36 (5), 547 – 553 (1987).

Dikaiakos, M. D., Steiglits, K. and Rogers, A.A Comparison of Techniques Used for Mapping Parallel Algorithms to Message-Passing Multiprocessors, Department of Aston, Washington University, Seattle, WA, USA, 1994, pp. 434 – 442.

Eicken, T., Culler, D., Goldstein, S. and Schauser, K.Active Messages: A Mechanism for Inte-grated Communication and Computation, Proceedings 19th International Symposium on Computer Architecture, ACM Press, May 1992, Gold Coast, Australia.

Elnozahy, M., Alvisi, L., Wang, Y.-M. and Johnson, D., A Survey of Rollback-Recovery Protocols in Message Massing Systems.

Hsu, J.-M. and Banerjee, P. A Message Passing Coprocessor for Distributed Memory Multicomputers, Coordinated Scientiﬁc Lab., Illinois Univ., Urbana, IL, USA, 1990, pp. 720 – 729.

Johnson, S. and Ho, C.-T. High performance communications in processor networks. Pro-ceedings of the 16th ACM Annual International Symposium on Computer Architecture, 150 – 157 (1989).

Klaiber, A. C. and Levy, H. M.Comparison of Message Passing and Shared Memory Archi-tecture for Data Parallel Program, Department of Computer Science and Engineering, Washington University, Seattle, WA, USA, 1994, pp. 94 – 105.

Morin, S. Implementing the Message Passing Interface Standard in Java, MSc thesis, Electrical and Computing Engineering Department, University of Massachusetts, September 2000.

Ni, L. and McKinley. A survey of wormhole routing techniques in direct networks.IEEE Computer Magazine, 26 (2), 62 – 76 (1993).

PACT.Message Passing Fundamentals, PACT Training Group, NCSA 2001, University of Illinois, 2001.

Panda, D. Issues in designing efﬁcient and practical algorithms for collective communication on wormhole-routed systems,Proceedings of the 1995 ICCP Workshop on Challenges for Parallel Processing, Vol. I, 8 – 15 (1995).

Pierce, P. The NX message passing interface.Parallel Computing, 20 (4), 463 – 480 (1994).

Richard, Y.KAIN, Advanced Computer Architecture, A System Design Approach, Prentice-Hall, 1995.

Stone.High-Performance Computer Architecture, 3rd edition, Addison-Wesley, 1993.

Suanya, R. and Birtwistle, G. (editors)VLSI and Parallel Computation, Morgan Kaufmann Publishing Co., 1990.

Wrllkinsen, B.Computer Architecture: Design and Performance, Prentice-Hall, 1996.

Websites

http://www.lysator.liu.se/oscar/sp2

http://www.npac.syr.edu/nse/hpccsurvey/orgs/ibm/ibm.html http://cs.felk.cvut.cz/pcg/html/supeur96

http://citeseer.ist.psu.edu/dongarra99chapter.html http://www-ﬂash.stanford.edu/architecture/papers

Dans le document ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING (Page 137-142)