• Aucun résultat trouvé

Design-space exploration for CMOS photonic processor networks

N/A
N/A
Protected

Academic year: 2021

Partager "Design-space exploration for CMOS photonic processor networks"

Copied!
4
0
0

Texte intégral

(1)

Design-space exploration for CMOS photonic processor networks

The MIT Faculty has made this article openly available. Please share

how this access benefits you. Your story matters.

Citation

"Design-space exploration for CMOS photonic processor networks."

Conference on (OFC/NFOEC) Optical Fiber Communication (OFC),

collocated National Fiber Optic Engineers Conference, 2010. ©2010

IEEE

As Published

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?

reload=true&arnumber=5465639&contentType=Conference

+Publications

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Version

Final published version

Citable link

http://hdl.handle.net/1721.1/74149

Terms of Use

Article is made available in accordance with the publisher's

policy and may be subject to US copyright law. Please refer to the

publisher's site for terms of use.

(2)

Design-Space Exploration for CMOS Photonic Processor

Networks

Vladimir Stojanovića, Ajay Joshib, Cristopher Battena, Yong-Jin Kwonc, Scott Beamerc, Sun Chena and Krste Asanovićc

aMassachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, Massachusetts 02139 bBoston University, Boston, Massachusetts

cUniversity of California, Berkeley, California *Email: vlada@mit.edu

Abstract: Monolithically integrated dense WDM photonic network topologies optimized for loss

and power footprint of optical components can achieve up to 4x better energy-efficiency and throughput than electrical interconnects in core-to-core, and 10x in core-to-DRAM networks. ©2009 Optical Society of America

OCIS codes: (200.4650) Optical interconnects; (250.5300) Photonic integrated circuits;

1. Introduction

This paper presents a review of recent advances [1,2] in building high-throughput, energy-efficient photonic networks for core-to-core and core-to-DRAM communication in manycore processor systems. To sustain the performance scaling in these systems, the increase in core count has to be followed by the corresponding increase in energy-efficiency of the core, the interconnect, and bandwidth density [3,4]. Due to pin-density, wire-bandwidth and power dissipation limits, electrical DRAM interfaces are not expected to supply sufficient bandwidth with

reasonable power consumption and packaging cost, and similar issues also limit energy-efficiency and bandwidth density of global on-chip wires. With potential for energy-efficient modulation/detection and dense wavelength division multiplexing (DWM), silicon-photonic interconnect technology is well suited to alleviate the bottleneck, however its application has to be carefully tailored to both the underlying process technology and the desired network topology.

2. Monolithic CMOS photonic network design

Recently developed infrastructure for photonic chip design and post-fabrication processing methodology [5,6] enabled for the first time monolithic integration of polysilicon and silicon-based photonic devices in a standard bulk CMOS and thin BOX SOI fabrication flows commonly used for processors. Based on this technology and the tight interaction between design of photonic interconnect components (waveguides, ring-resonators, modulators, photo-detectors, waveguide crossings) and network topology, in [1] we have proposed an efficient hybrid electro-optical core-to-DRAM shared memory network (local mesh global switch – LMGS) shown in Fig. 1, which provides a near ten-fold improvement in throughput compared to optimized electrical networks projected to 22 nm process node and a 256 core processor. To provide a balance between the bandwidth and latency/link utilization, the traffic from several tiles is aggregated via local electrical mesh into a point-to-point dense WDM interconnect with wavelength addressing to a part of a DRAM space. External buffer chip receives the optical signals and arbitrates requests from several core groups to the same DRAM module. This relatively simple interconnect results in significantly reduced number of optical components in the network compared to, for example, high-radix optical crossbar [7], minimizing the thermal tuning costs as well as losses along the optical path. To relax the loss specifications on integrated photonic devices within a required optical power envelope, Fig. 3a, the physical layout of the network follows a U-shape. Balancing the mesh bandwidth with degree of tile aggregation and optical bandwidth enables efficient utilization of raw energy-efficiency advantage of photonic over electrical interconnect (both across die and die-to-die), as shown in Fig. 4. Similar methodology was used to optimize the core-to-core network to provide a more uniform access for a variety of traffic patterns, by utilizing dense, energy-efficient photonic interconnects to realize otherwise expensive non-blocking Clos network, Fig. 2. Again, aggregation is used to decrease the radix of the network and balance the electrical and optical power of the network. The Clos photonic layout also follows the U-shape to relax the photonic device loss requirements, Fig. 3b. The Clos achieves significantly better latency and throughput uniformity compared to a concentrated mesh network, Fig. 5, across a variety of traffic patterns.

Acknowledgements

We would like to acknowledge all the members of the MIT photonic team, including J. Orcutt, A. Khilo, M. Popović, C. Holzwarth, B. Moss, H. Li, M. Georgas, J. Leu, F. X. Kärtner, J. L. Hoyt, R. J. Ram, and H. I. Smith. This work was supported by DARPA awards W911NF-08-1-0134 and W911NF-08-1-0139.

a876_1.pdf OSA / OFC/NFOEC 2010

OWI1.pdf

(3)

References

[1] C. Batten, et al, “Building manycore processor to DRAM networks with monolithic CMOS silicon photonics,” IEEE Micro, vol. 29, no. 4, pp. 8-21, 2009.

[2] A. Joshi, et al, “Silicon-Photonic Clos Networks for Global On-Chip Communication,” 3rd ACM/IEEE International Symposium on

Networks-on-Chip, San Diego, CA, pp. 124-133, May 2008.

[3] M. Tremblay, S. Chaudhry, “A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor,” IEEE ISSCC,

pp. 82-83, 2008.

[4] S. Bell et al, “TILE64 Processor: A 64-Core SoC with Mesh Interconnect,” IEEE ISSCC, pp. 88-598, 2008.

[5] J. S. Orcutt, et al, “Demonstration of an Electronic Photonic Integrated Circuit in a Commercial Scaled Bulk CMOS Process,” Optical Society of America - CLEO/QELS Conference, San Jose, CA, 2 pages, May 2008.

[6] C. W. Holzwarth, “Localized Substrate Removal Technique Enabling Strong-Confinement Microphotonics in Bulk Si CMOS Processes,” Optical Society of America - CLEO/QELS Conference, San Jose, CA, 2 pages, May 2008.

[7] D. Vantrease et al, “Corona: System Implications of Emerging Nanophotonic Technology,” ISCA, June 2008.

Fig. 1. Photonic network template for core-to-DRAM communication: LMGS network, 64 tiles (4 cores per tile), 64 waveguides (for tile throughput = 128 b/cyc), 256 modulators per group, 256 ring filters per group, total rings > 16K with

0.32W on thermal tuning

Fig. 2. Photonic network template for core-to-core communication: Clos Network, 64 tiles, 56 waveguides (for tile throughput = 128 b/cyc), 128 modulators per cluster, 128 ring filters per cluster, total rings ≈ 28K with 0.56W on thermal tuning

a876_1.pdf OSA / OFC/NFOEC 2010

(4)

(a) (b)

Fig. 3. Optical power requirement contours for: (a) core-to-DRAM LMGS network in Fig. 1, (b) core-to-core Clos network in Fig. 2,

(a) (b) (c)

Fig. 4. Power and performance for various LMGS network configurations assuming (a) electrical interconnect with grouping, (b) electrical interconnect with grouping and overprovisioning and (c) photonic interconnect with grouping and overprovisioning. Link

widths chosen such that network does not exceed power budget of 20 W.

EClos PClos EClos

PClos

(a) (b) (c)

Fig. 5. Power and performance comparison of (a) electrical cmeshX2 (128b channel width) (b) electrical Clos and photonic Clos with 64b channel width and (c) electrical and photonic Clos with channel width of 128b

a876_1.pdf OSA / OFC/NFOEC 2010

Figure

Fig. 2. Photonic network template for core-to-core communication: Clos Network, 64 tiles, 56 waveguides (for tile throughput
Fig. 5. Power and performance comparison of (a) electrical cmeshX2 (128b channel width) (b) electrical Clos and photonic Clos with  64b channel width and (c) electrical and photonic Clos with channel width of 128b

Références

Documents relatifs

It should be noted, that the beam deflection using the stochastic mechanism of fast charged particle scattering by bent crystal rows, in contrast to the previ- ously considered

CHAPITRE I: ETUDE BIBLIOGRAPHIQUE I. Matériaux nano-poreux ... Différents types de nanoparticules ... Nanoparticules polymériques ... Nanoparticules lipidiques ...

Un aperçu des communautés anglicanes (chapitre 7) complète j udicieusement ce tour d'horizon. Il comporte une abondance bibliographie, un nombre impressionnant de

In [9], the proposed meta-heuristic-based downlink power allocation for LTE/LTE-A provides the required QoS by tuning the transmit power at each cell and minimizing the

– Using the strongly p-connected components is the key of our approach, which has not been used in any scientific work in order to detect cores or communities in di- rected

Then, the firing on c (i+2) is lost and this models the fact that packets sent after a bad packet are just wasted. In order to compare the observational behaviours of the protocols,

Truss and nucleus decomposition. In a more recent work, Sariy¨ uce et al. [129] capitalized on the relationship between core number and h-index [98], in order to propose efficient

Heute sind Jugendliche längst nicht mehr Subjekt eigener Forschungen sondern Objekt von WissenschaftlerInnen, JournalistInnen und PädagogInnen, deren Intentionen sie in der Regel