• Aucun résultat trouvé

KB Shared bus

Dans le document Digital Technical Journal (Page 36-41)

tectural and tech nological constraints that defi ne the M E MOR Y C HANNEL 2 design space. To increase the

8 KB Shared bus

77 M B/s

Com parison of first- and Second- generation MEMORY CHANNEL Architectures

D N I D

I

TP

CMD

I

SID

ADDRESS

-ADDRESS

DATA

(4 TO 256 BYTES)

ERROR

-DETECTION

( a ) Data Packet

Fig u re 4

HEADER

PAY LOAD

E R RO R DETECTION CODE

!YlEMORY CHANNEL 2 Packer Format

Digit;ll Tc(hnical )omnal Vol. 9 No. I 1 997

PST AT

I

TP

CFG

l

D N I D

H U B

-STATUS GLOBAL

-STATUS E R ROR

-DETECTION

( b ) Control Packet

M E MORY CHANNEL 2 1 6 bits

F u l l d u p l ex LVDS Yes 66 M H z 1 33 + 1 33 M B/s 1 00 M B/s 256 bytes Yes

32-bit Reed-Solomon Receive

4 KB a n d 8 KB Cross bar

800 to 1 , 600 M B/s

HEADER

CONTROL I N FOR MATION

ERROR DETECTION CODE

On M EM O RY C H A N N E L l cl usters, the network add ress is mapped to a local page of physical memory using remapping resources contained in tbe system's PCI-to- host memory bri dge. Al l Alp haScrvcr systems implement rh.:sc 1-cmapping resources. Other sys­

tems, particularly those with 3 2 - bi t addresses, do nor i mplement this PCI - to-host memory re mapping resource. On M EM O RY CHANNEL 2 , software has the option to enable remapping in the receiver side of the MEMORY

CHANNEL

2 adapter on a per­

net\vork- page basis. When configured tor remapping, a section of the

PCT

is used to store the upper address software-assisted s lurcd memory. The prim itive a llows a node ro complete a re:1d request to another node without software intervention. It is imple­

me nted by a new remote read-on-write attribute in the receive page con trol ta ble. The req uesting node generates a write with the appropriate remote address (a rc::�d - n.:q ucst write ). When the packet arrives at the receiver, irs address maps in the

PCT

to a page marked as remote read . After remapping ( i f enabled) , the address is converted to :1 PCT read command . The read data is retu rn ed as a M E M O RY CHANN E L write to the s:1me address as the original read - request write.

Since read access to a page of memory in a remote node is provided by a unique network address, privi­

leges to write or read d uster memory remain com­

pletely independent.

A glob<1l clock mechanism has been introd uced ro provide support [()r clusterwidc synch ronization . Gl obal clocks, which arc highly accurate, arc extre mely usefu l in many distri buted applications, such as p<u·al lel databases or distri buted debuggi ng. The M EMORY CHANN EL 2 h u b implements this global clock by pcriod ic1lly sending synchron ization packets to a l l nodes in the cluster. The reception o f s u c h a pulse can be nLlde to trigger an interrupt or, on future MEMO RY CHAN NEL-to-CPU direct-interrace sys­

tems, may be used to update a local cou nter. The interrupt service software updates the ofrser bet\vcen t he local time :�nd the global ti me. This synchmniza­

tion mechanism a llows a u nique clusterwidc time to be maintained with an accuracy equal to twice the rJnge ( max - m i n ) of rbe 1'vi EMORY CHAN N EL net­

work latency, plus the interrupt service routi ne time.

Conditional write transactions have been i ntro­

duced in MEi\1\.0RY CHANNE L 2 to improve the speed of a recoverable messaging svstem. On MEMORY

CHANNEL l , the simplest implementation of general­

purpose recoverable IllCSS<lging requires J round -trip acknowledge delay to validate the message tra nsfer, which adds to the communication latency. The

!\'\ EMORY

CHAN N E L 2's newly introd uced condi­

tional write transaction prm·idcs a more efficient i mpleme ntation that requires a single acknowledge packer, thus practically red ucing the associated latency by more than a factor

of

t\vo.

Memory Channe/ 2 Hardware

As suggested in the previous architectural description, M E M O RY CH A N N E L 2 ha rdware components arc functionall y partitioned into t\I'O subsystems: the PCl intert:Ke and the link interf:1cc. First in, first out

( fifO)

queues arc placed bet�,·een the t\vo subsystems. The PCI interrace communicates with the host system, feeds rhe link intcrflCe wirh data packers ro be sent, and torwards received packets on to the PCI bus. The l i n k interface manages the l i n k protocol and data tlow: I t f(xmats data p<Kkets, generates control packers, and hand les error code generation and detection. It also multiplexes the data path fiom the PC! format ( 32 bits at 33 megahertz [ M H z ] ) to the link protocol

( 1 6

bits

at 66 M H z ) . In addition, rhc l ink interface implements the conversion to and ti·om LVDS signaling.

The transmit

(TX)

and receive

( �'\)

data paths, both heavi ly p ipelined, arc kept com pletely separate from each other, Jnd there is no resource contlict

mal MEMORY CHANNEL 2 transaction, the transmit pipel ine processes a t ransmit request from the PC!

bus. The transmit PCT is addressed with a su bset of rhc PCI add ress bits and is used to determine rhe i ntended destination of the packet and irs attribmcs.

The transmit pipel ine te�:ds the l i nk imcrbce with d <lta packets and appropriate commands through the trans­

mit F I FO queue. The l i n k intert:Kc formats the pack­

ets and sends them on the link cable. Ar the receiver, the link interface disassembles rhe packet in an inter­

mediate format and stores ir i nto the receive FTJ-:0 queue. The PCI intcrbcc performs a lookup in the

Digital Tcc hni c,\l journ,ll Vol. 9 :--:v. l I ')97 35

Figure 5

Block Di,1;;r,1m of a MEl\ lORY CHA:-J :--i EL 2 Alhptcr

rcccil'l: r PCT to ensure that the page has hccn enabled

r(>r

reception and to determine the loc1l destination add ress .

I n

the simplest i m plememarion, J1<1ckcr� ;nc subject to tin> store -and-t(.>t-wJrd dclavs-onc on the rr;:msm ir path <l lld one on the reccin: path. lkc;Juse oF the

;Homicin oF p:1.ckcrs, the trJnsmit p<Hh must ,,·ai r

ten

the 1:-�st dac1 11-ord to b

e

couccrll· t;J kcn in ti-om the PU hus before tor11·ard ing the

JXld::ct

w the l ink inrer­

t:Ke . The rccci1·e path experiences a d

c

L11· hcc1llsc the error detection protocol req u i res the checking of the IJst c1·cle bctorc the p<lcket c1n be dccbrcd error- Free.

A set of control/status 1VI EMO RY

C H A N N E l .

2 regis­

ters, add rcss<lble t hrough the l1Cl , is used to set ,.Jri­

ous modes oF operation :�nd to read local status of the link Jllli globJI cluster status.

The MEMORY CHAN N E L 2 Hub The hub is the cen­

tr;ll resou rce that i nterconnects <11 1 nodes to form a cluster. Figure 6 is <1 block d iagr;Jlll of <111

8

- h1·-

8

M FMORY C !-T.A;'\ :-\ F L 2 h u b . The h u b i m J1Iemcnrs

;J non hlocking

8 - l)\'-8

crossb<lr <llld i mcrhccs w eight

! 6- b i r-11 ide fu l l -duplex l inks b1· mcu1s

of;J i i n k

i mer­

bcc si 1 1 1ilar to that used in the <llhprcr. The <lCtual crossbar Ius eight input ports and eight output ports, ,1 1! I (> bits 11·idc . Each output port h<ls an

8-ro- l

mu lti­

plexer, which is able to choose ti·om one oF eight input ports. E:tch mu ltiplexer is contro l led

bv

a loc1l :�rbitcr, ll'hich is fed decoded destination requests ti·om the eight i npur f10rts. The port arbi n-.nion is b;Jscd on a h\cd-prioritl·, request-sampl i ng :�l gorirhm. All req uests that •• n·i1 c 1\·ithi n J sampling imcn·;JJ a1·c consid ered of cqu<l l ;1gc <lllli Jre sen·iced bd(>rc anv nell' requests.

This <ligori thm, 11·h il e nor ent(m.:i ng

.1 bsol ute

<llTi,·al ­ tilllC mdering ;1n10ng p<lckers scllt ti·om dith: rcnr

\'oi . <J �'o. l ] <)<)7

LINK CABLE

nodes, :�ssurcs no SLlJ'I'<ltion and a bir agc-driVl:n prior­

ity Jcross s:�mpling imcn·als.

vVben a broad c.1st rcqut:sr arrives at the

h u b, tht:

orhcrll'ise

i ndependent arbiters sy

n

c

h

ronize the m ­ selves to rra nskr the bro:Jdcast packet.

The

arbiters 11·air t(>r the com p letion of the packet current l y being transferred , d is<lble poi n r-to -poi n t arbitration , signal that tht:l' :Jre rCllh' r( >r bro:Jd cast, and then

ll'ait r(>r

ali other �1orrs w :trri1-c at the same svnchronizarion point. O nce all output

pons

arc readv

tor

broadcast, port 0 pmcecds to read

from

the appropriate i n p u t port, a n d a l l othn ports ( i nc luding port 0) select the s:�mc input source . The maxim u m synchron ization 11·air time, assuming no output queue blocking, is equJI to the time i r t<lkcs to transfer the largest size packers ( 2 5 6 lwres ) , about 4 f.-LS, and is i ndependent of the number of ports. As i n JIW cross bar architecture ll'ith J single poim of u>hcrenc1·, such broadcast operation is mme cosrlv rhan a point - to-point rranskr. Our experie nce hJs been th<lt some critic:ll but rebtivcll·

lo11

-rin.jucnn

opcrJtions ( primariil' fast locks ) exploit the broadcast circ u i t .

MEMORY CHANNEL 2 Design Process and Physical Implementation

Figure 7 i l l ustnrcs the mai n J'v! E M O RY CH A N N E L physical components. As shown in Figure 7a, t\I'O-node dusters can be constructed by directly connecti ng two M E MO RY C H A N N E L PCJ adapters and a cable. This contlgu rarion i� c:1l led the 1·i rrual h ub contlguration.

Figure

7b

sho

ll

's clusters i nterconnected b1· means of a

hub.

The ME1\!l O RY

CHAN!'-ll--:1.

adap ter is implemented :ts a si ngle PC! c;mi . The h u b consists of a

mother-POR T 0 _____. IN TERFACE IN/O U T LINK 0

I

IN 0

POR T 1 IN/O U T LINK

I

_____. I N TERFACE 1

I

IN 1

PORT 7 - IN/OU T LINK IN TERFACE 7

I

IN 7

Figure 6

Rlock Diagr;1m ofan l\-lw-1\ M EMORY CHANNEL 2 Hub

bo:1rd that holds rhc S\\'itch and a set of l i nccards,

one

per port, th:H p n)\·idcs

the

i ntcrhcc to the l i n k cable . The :1d .1ptcr and h u b impleme ntations usc a com­

bi nation of progr:unmablc logic dC\·iccs :1 11d off-the­

she l f components. This design 11 as preferred to an applic:nion-spcci ric i ntcgr:ncd circ u i t ( AS I C ) i mpk­

mcnution because of the short time-to-market

( a ) VirtuJI huh mode: direcr node- ro-noJe

intcrconnenion of tH"o PC: I J(bprcr cards

Fig ure 7

M I-: M O RY C : H A N N E J . I b rd\\':1 1'l' C :olll f'Oill'nts

1 6

1 6

1 6

-

--

--

--

-I

ARBITER 0

I

I

::=:�1

- -

-

1 6

- -

-= F- 1

-- -- -- 16

- -

-I

A<C,

L I

=====

F---

- - - - -_ _ _ _ _._...

i I

1 6

-1

I

ARBITER

7 1

rcq uircmcms. I n :1dd i tion, some of thc nc,,· function­

alit\' \\'il l CH>ivc :1s softw:1rc is modified to t�1ke :1d1·:1 n ­ tagc o f the ne\\' k:nurcs . ·rhc M E,\1\ 0 il.Y C : H A N � I-: 1 . 2 design \\':15 de1·eloped cmirch- i n Vni log :l t the regis­

ter rranskr IC\·c l

( ll.T I. ) .

I t \\':1S si m u l :ncd usi11g the

Viell'logic

VC:S c1·cm-dri1cn si m u l :uor :1 11d Sl'll thc­

si zcd 11·i r h the S1·nops1·s too l . The res u l ti n g nctl ist

(b) Using the MEMOit\' C H A � N E I . h u h to create cl usters of u p to 1 6 nodes

Disit.d Tcch11ic.li Jomn.1l \d . <.J :--;"· I I ')<.J7

ll'as ted t h rough the appropri�lte 1 e n d or tools

t(n

efrecti,·e process-to-process band11idrh achie1·ed using a pair ofAJph:1Sen er 4 1 00 S\'Stems. With 2 5 6-bl'te p:lck­

ets, M E M O RY C H A N N E L 2 achin·es 1 2 7 iVlB/s or about 96 percem of the rail' ll'ire bandll'idth.

For P C ! IITites of less than or equal to 2 56 bvtes, the ME M O RY C H A N N EL 2 i n terface simply converts the P C I write to a sim i l ar-size

iVIEMORY

CHAN N E L packet. The current d esign does not aggregate mu lti­

ple PC ! write transactions imo a single M E M ORY message L1te11c1' r(Jr d i fkrem tl'fXS of comm u nications

, - - - -NODE

-

- - -

-

-

-,

simplest implementation

of

,·ariable-length messaging.

The latencies ofstanlhrd communication intertdccs arc packets . On the one hand, smaller packets arc less etli­

cient

than larger

ones in

term of overhe

a

d. On t he other hand, smalkr packets incur a

s

horter store-and ­ forward del

a

y per p;�cket, which Gin then be over­

generation MEMORY CHAl'\JNEL network, MEMORY CHANNEL 2 . The rationale behind the major design decisions arc discussed in light of the c :;pcri enc

e

gained from M E1\IIORY CHAN NEl.

l .

A description of the M EMO RY C HANNEL 2

hat-dware

components

Jed

to the pr

es

entation of measured pcrfcmnancc results.

Digital Tcchnic.1l }ou rn.�l Vol . 9 No. l 1997 39

Dans le document Digital Technical Journal (Page 36-41)

Documents relatifs