Äîñòèæåíèå ðåêîðäíûõ ïîêàçàòåëåé â
GreenGraph500 äëÿ âû÷èñëèòåëüíûõ ñèñòåì íà
ÏËÈÑ. Òåîðèÿ è ïðàêòèêà
À.Ä. Ñèçîâ,Ñ.. Åëèçàðîâ
Ìîñêîâñêèé ãîñóäàðñòâåííûé óíèâåðñèòåòèìåíè Ì.Â. Ëîìîíîñîâà
Ñîâðåìåííûå ìèðîâûå äîñòèæåíèÿ â îáëàñòè ðàçðàáîòêè ýíåðãîýåêòèâíûõ
ïðîãðàììèðóåìûõ ëîãè÷åñêèõ ñõåì (ÏËÈÑ), îáøèðíûé îïûò ïðèìåíåíèÿ ðå-
êîíèãóðèðóåìûõ ñïåöâû÷èñëèòåëåé ïðè ðàçðàáîòêå ïðîáëåìíî îðèåíòèðîâàí-
íûõñóïåðêîìïüþòåðîâè óæåïðîäåìîíñòðèðîâàííûå âîçìîæíîñòèñîçäàíèÿ íà
ÏËÈÑ êîíòðîëëåðîâ ïàìÿòè è êîììóíèêàöîííûõ ïðîöåññîðîâ ñî ñâåðõíèçêîé
ëàòåíòíîñòüþ, ïîçâîëÿþò ïðåäïîëàãàòü, ÷òî èìåííî íà òàêîé ýëåìåíòíîé áàçå
ñåãîäíÿ ìîãóò áûòü ñîçäàíû âû÷èñëèòåëüíûå ñèñòåìû ñ ðåêîðäíûìè íà òåñòå
GreenGraph500ïîêàçàòåëÿìè.Âðàáîòåîáñóæäàþòñÿòðåáîâàíèÿêâû÷èñëèòåëü-
íîéñèñòåìåíàÏËÈÑñâíåøíåéïàìÿòüþïðèìåíèòåëüíîêðåøåíèþçàäà÷èïî-
èñêàâøèðüïîãðàó(BreadthrstsearhBFS),ó÷èòûâàþùèåèìåþùèéñÿìè-
ðîâîéîïûòèîñîáåííîñòèëó÷øèõñóùåñòâóþùèõïàðàëëåëüíûõàëãîðòìîâBFS.
àññìîòðåí ðåàëüíûé âû÷èñëèòåëüíûé óçåë, ñîäåðæàùèé ÏËÈÑ Kintex Ultra
Sale ñ 4-ìÿ êîíòðîëëåðàìè ïàìÿòè RLDRAMIII. Îöåíåíà ïðîèçâîäèòåëüíîñòü
ñèñòåìû èç 32-äâóõ òàêèõ óçëîâ,ðàññ÷èòàíà ýíåðãîýåêòèâíîñòü ïîêðèòåðè-
ÿì ðåéòèíãà GreenGraph500 è äàíû ðåêîìåíäàöèèïî äàëüíåéøåéîïòèìèçàöèè
àïïàðàòóðû.
1. Ââåäåíèå
Graph500ìèðîâîéðåéòèíãñóïåðêîìïüþòåðîâ,ïðåäíàçíà÷åííûõäëÿðåøåíèÿçàäà÷,
ñâÿçàííûõ ñ îáðàáîòêîé áîëüøèõ ãðàîâ. Äëÿ ðàíæèðîâàíèÿ ýòèõ ñèñòåì èñïîëüçóåòñÿ
BFS ïîèñêâøèðèíóâíåîðèåíòèðîâàííîìðàçðåæåííîìãðàå.Ýòîòòåñòâáîëüøåéñòå-
ïåíè íàãðóæàåò êîììóíèêàöèîííóþ ïîäñèñòåìó è êîíòðîëëåðû ïàìÿòè, òàê êàê äàííûé
àëãîðèòì ïîäðàçóìåâàåò ðàáîòó ñ áîëüøèì îáúåìîì íåðåãóëÿðíûõ äàííûõ â ïðîòèâîïî-
ëîæíîñòü Top500, îðèåíòèðîâàííîìó íà âû÷èñëåíèÿ íàä ÷èñëàìè ñ ïëàâàþùåé òî÷êîé íà
òåñòå HPL Linpak.  äîïîëíåíèå ê Top500 î÷åíü âîñòðåáîâàí Green500 ðåéòèíã ýíåð-
ãîýåêòèâíîñòè âû÷èñëèòåëüíûõ ñèñòåì íà òåñòå Linpak. Ïðåäëîæåííûé â 2012 ãîäó
GreenGraph500, ñî÷åòàåò óêàçàííûå âûøå ïîäõîäû è ðàíæèðóåò ñèñòåìû èç Graph500 ïî
ïðîèçâîäèòåëüíîñòè â GTEPS (
10 9
ïðîéäåííûõ äóã â ñåêóíäó) íà Âàòò ýëåêòðîïîòðåáëå- íèÿ. Âàæíîñòü ýòîãî òåñòà ñëîæíî ïåðåîöåíèòü, òàê êàê èìåííî ýíåðãîýåêòèâíîñòü èñêîðîñòü ðàáîòû ñî ñâåðõáîëüøèìè îáúåìàìè íåðåãóëÿðíûõ äàííûõ ÿâëÿþòñÿ îñíîâíûìè
òðåáîâàíèÿìèê ñóïåðêîìïüþòåðàìèöåíòðàì îáðàáîòêè äàííûõáóäóùåãî [1℄.
Ñîâðåìåííûé îïûò ïîêàçûâàåò, ÷òî îäèí èç íàèáîëåå óäà÷íûõ ïîäõîäîâ ê ïîñòðîå-
íèþçàêàçíûõïðîáëåìíîîðèåíòèðîâàííûõâû÷èñëèòåëüíûõñèñòåì(ÏÎÂÑ)ìàêñèìàëüíîé
ýíåðãîýåêòèâíîñòèèñïîëüçîâàíèåñïåöèàëüíûõóñêîðèòåëåéíàáàçåïðîãðàììèðóåìîé
ëîãèêè(ÏËÈÑ)[2℄.Ñäðóãîéñòîðîíû, êðèòè÷åñêèìàêòîðîì,îãðàíè÷èâàþùèìïðîèçâî-
äèòåëüíîñòüïðèðåøåíèèãðàîâûõçàäà÷,ÿâëÿåòñÿñêîðîñòüñëó÷àéíîãîäîñòóïàâïàìÿòü.
Ïîêàçàíî[3℄,÷òîïðîèçâîäèòåëüíîñòüòðàäèöèîííûõCPU/GPUàðõèòåêòóð,îðèåíòèðîâàí-
íûõ íàáëî÷íóþðàáîòóñâíåøíåéïàìÿòüþèèñïîëüçóþùèõ ãëóáîêèå êîíâåéåðûêîìàíäâ
ñîâîêóïíîñòèñíåñêîëüêèìèñòóïåíÿìèêåøèðîâàíèÿäàííûõ,ñíèæàåòñÿíà1-2ïîðÿäêàíà
çàäà÷àõòèïà BFS.Îäíàêî íàÏËÈÑâîçìîæíîðåàëèçîâàòüñïåöèàëèçèðîâàííûåêîíòðîë-
ëåðûïàìÿòè,ïðàêòè÷åñêèëèøåííûåóêàçàííîãîíåäîñòàòêà,òàêèåêàê,íàïðèìåð,âñèñòå-
ìå Convey MX-100, âõîäÿùåé â ïåðâóþ ñîòíþ ðåéòèíãà Graph500 [4℄.Ïðîèçâîäèòåëüíîñòü
ïîäñèñòåìû ïàìÿòèìîæíî åùåóâåëè÷èòü, ïåðåéäÿ ê îòëè÷íûì îò DDR3/DDR4 àðõèòåê-
òóðàì[6℄.Äðóãîéïîäñèñòåìîé, îïðåäåëÿþùåéïðîèçâîäèòåëüíîñòü âçàäà÷åBFS,ÿâëÿåòñÿ
êîììóíèêàöèîííàÿñåòü,ñîåäèíÿþùàÿâû÷èñëèòåëüíûåìîäóëè[7℄.Èçâåñòíî,÷òî íàèáîëåå
áûñòðûå è íèçêîëàòåíòíûå êîììóíèêàöèîííûå ñåòè äëÿ ÏÎÂÑ íà ÏËÈÑ ïîñòðîåíû íà
ìóëüòèãèãàáèòíûõòðàíñèâåðàõè êîììåð÷åñêèäîñòóïíûõêîììóòàòîðàõ PCIe[8℄.
Ïðåäëàãàÿ ÏËÈÑ, â êà÷åñòâå îñíîâíîãî âû÷èñëèòåëüíîãî óçëà, íóæíî ïðèíèìàòü âî
âíèìàíèå èçâåñòíûå íåäîñòàòêè ÏËÈÑ îòíîñòèòåëüíî óçëîâ íà îñíîâå CPU/GPU: íîìè-
íàëüíàÿ ðàáî÷àÿ ÷àñòîòà ÏËÈÑ ñîñòàâëÿåò 300-600 Ìö, êîòîðàÿ â 5-10 ðàç óñòóïàåò ðà-
áî÷åé ÷àñòîòå ñîâðåìåííûõ êîììåð÷åñêèõ ïðîöåññîðîâ. Îáúåì áûñòðîé ïàìÿòè, ðàñïîëî-
æåííîé íåïîñðåäñòâåííî íà êðèñòàëëå ÏËÈÑ, îãðàíè÷åí 1-10 ÌÁàéò, ÷òî íå ïîçâîëÿåò
èñïîëüçîâàòüÏËÈÑäëÿ ðåøåíèÿçàäà÷áîëüøîãî ðàçìåðàáåçïðèìåíåíèÿ âíåøíåéïàìÿ-
òè.Öåíà òîïîâûõÏËÈÑíà ïîðÿäîêïðåâûøàåò öåíóñîîòâåòñòâóþùèõ CPU/GPU.Êðîìå
òîãî, ñîçäàíèå ÏÎÂÑ íà áàçå ÏËÈÑ ïðåäïîëàãàåò äëÿ êàæäîé êîíêðåòíîé çàäà÷è ñîçäà-
íèåèîòëàäêóâû÷èñëèòåëÿíàÿçûêåîïèñàíèÿàïïàðàòóðû,ñëîæíîñòüêîòîðîéíàïîðÿäîê
âûøå íàïèñàíèÿïðîãðàììûïîä òðàäèöèîííûåàðõèòåêòóðûíà ÿçûêàõâûñîêîãîóðîâíÿ.
 íàñòîÿùåé ðàáîòå ïðîâîäèòñÿ àíàëèç ëèòåðàòóðû è òðåáîâàíèé ê àïïàðàòíîé áàçå
ÏÎÂÑ äëÿ ïîñòðîåíèÿ òîïîâûõ ðåøåíèé â GreenGraph500. Âûïîëíÿåòñÿ ðàñ÷åò ïàðàìåò-
ðîâ îïòèìàëüíîéêîíèãóðàöèè,äàþòñÿðåêîìåíäàöèèäëÿñîçäàíèÿÏÎÂÑäëÿãðàîâûõ
çàäà÷ ðàçëè÷íîãî ðàçìåðà. Ïðîâîäèòñÿ àíàëèç ïðèìåíèìîñòè ðàçðàáîòàííîãî äëÿ äàííî-
ãî ÏÎÂÑàëãîðèòìà ïîèñêà âøèðü. Â ðàìêàõ äàííîé ðàáîòû ïðåäïîëàãàåòñÿîïðåäåëåíèå
ïðîèçâîäèòåëüíîñòè îäíîãî óçëàíà àëãîðèòìå BFS ñ ïîìîùüþìîäåëèðîâàíèÿ ðàáîòû ðå-
àëüíîãî ÏËÈÑ.
2. Îáùàÿ ïàìÿòü
Êàê ñêàçàíî âûøå, BFS ïðåäïîëàãàåò ìíîæåñòâî ñëó÷àéíûõ îáðàùåíèÿ â îáùóþ ïà-
ìÿòüâñåéâû÷èñëèòåëüíîéñèñòåìû.Âðàáîòå[3℄ïîêàçàíî,÷òîïèêîâàÿïðîèçâîäèòåëüíîñòü
êîíòðîëëåðîâïàìÿòè âòðàäèöèîííûõ CPU/GPU àðõèòåêòóðàõ, ðàññ÷èòàííûõíà áëî÷íîå
÷òåíèå, äîñòèãàåòñÿ òîëüêî ïðè ðàáîòå ñ áîëüøèìè 4 ÊÁ è áîëåå áëîêàìè äàííûõ è ñíè-
æàåòñÿ íà ïîðÿäêè ïðè ÷òåíèÿõ îòäåëüíûõ ìàøèííûõ ñëîâ. àçìåð îáðàáàòûâàåìûõ àë-
ãîðèòìàìè BSF ãðàîâ ëåæèò â äèàïàçîíå îò Á äî ÏÁ, ïðè òîì, ÷òî êàæäûé çàïðîñíà
÷òåíèå â BFS îïåðèðóåò åäèíèöàìè ìàøèííûõ ñëîâ (4/8 áàéò íà ñëîâî), àäðåñà çàïðîñîâ
ïðàêòè÷åñêè ñëó÷àéíû, ïîýòîìó ýåêòèâíîå ÷òåíèåáîëüøèìè áëîêàìè íåâîçìîæíî. Òà-
êèì îáðàçîì, àðõèòåêòóðà êîíòðîëëåðà ïàìÿòè â êëàññè÷åñêèõ CPU/GPU àðõèòåêòóðàõ
ÿâëÿåòñÿ àêòîðîì, îãðàíè÷èâàþùèì îáùóþ ïðîèçâîäèòåëüíîñòü ñèñòåìû íà òåñòå BFS.
Ýòîïîçâîëÿåòïîëàãàòü,÷òîïåðåõîäêïðîáëåìíî-îðèåíòèðîâàííûìêîíòðîëëåðàìïàìÿòè,
íà êîòîðûõ âîçìîæíî äîñòèæåíèå ìàêñèìàëüíûõ ïðîïóñêíûõ ñïîñîáíîñòåé íà îïåðàöèÿõ
äîñòóïàïîñëó÷àéíûìàäðåñàì,ÿâëÿåòñÿîäíèìèçïåðñïåêòèâíûõíàïðàâëåíèéâñîçäàíèè
ÏÎÂÑäëÿ ãðàîâûõçàäà÷.
Óâåëè÷åíèå ïðîèçâîäèòåëüíîñòè ïîäñèñòåìû ïàìÿòè âîçìîæíî òàêæå ïðè èñïîëüçî-
âàíèè äðóãèõ òèïîâ ÎÇÓ, òàê íàïðèìåð ïðîèçâîäèòåëüíîñòü RLDRAMIII (Redue lateny
DRAM) íàñëó÷àéíûõ ÷òåíèÿõâ 2-3ðàçàáîëüøå, ÷åìäëÿ ñîîòâåòñòâóþùåéDDR3. [6℄
3. Êîììóíèêàöèîííàÿ ïîäñèñòåìà
 ñòàòüå [7℄ ïîêàçàíî, ÷òî â âû÷èñëèòåëüíûõ ñèñòåìàõ ñ ìíîãèìè óçëàìè ïðîèçâîäè-
òåëüíîñòü àëãîðèòìàBFS îïðåäåëÿåòñÿ êîììóíèêàöèîííîéïîäñèñòåìîé, îáåñïå÷èâàþùåé
îáìåí äàííûìè ìåæäó âû÷èñëèòåëüíûìè óçëàìè, ïîýòîìó ñíèæåíèå êîëè÷åñòâà ïåðåñû-
ëàåìûõ äàííûõ ïîçâîëÿåòçíà÷èòåëüíî ïîâûñèòü ïðîèçâîäèòåëüíîñòü ñèñòåìû â öåëîì. Â
ðàáîòå [8℄ ïîêàçàíà âîçìîæíîñòü ïîñòðîåíèÿ è ýåêòèâíîé ìàñøòàáèðóåìîñòè ñèñòåìû
èç íåñêîëüêèõ ÏËÈÑ, â êîòîðîé êîììóíèêàöèîííàÿ ïîäñèñòåìà ïîñòðîåíà íà áàçå ìóëü-
òèãèãàáèòíûõòðàíñèâåðîâ. Êîììåð÷åñêè äîñòóïíîé ñåòüþ òàêîãî òèïà ÿâëÿåòñÿïàêåòíàÿ
ñåòü PCIe ñ òîïîëîãèåé òèïà "çâåçäà", êîòîðàÿ â ðàìêàõ ñòàíäàðòà PCIe Gen3 ïîçâîëÿåò
äîñòèãàòüïðîïóñêíîéñïîñîáíîñòèäî16 Áàéò/ âäóïëåêñíîì ðåæèìå.
4. Ïðîåêò BFS äëÿ ÏÎÂÑ
1. for(i = 0; i < size(V); i++)
2. lvl[v℄ = Inf;
3. lvl[s℄ = 0;
4. write_to_bfs_queue(n,s); // write v to queue on hip n
5. //On every hip, on every level:
6. while(Q is not empty)
7. for (all u in Q) //1 read
8. for(all v in CSR[u℄)//3 reads
9. if (v loated in loal_mem)
10. if (lvl[v℄ > lvl[u℄) //2 reads
11. d[v℄ = u; //write
12. lvl[v℄ = lvl[u℄; //write
13. //add v into loal queue
14. write_to_bfs_queue(loal,v);
15. else
16. // send remote hek request
17. write_to_hek_queue(n,v);
èñ.1.ÏðîåêòðàñïðåäåëåííîãîàëãîðèòìàïîèñêàâøèðüíàÏÎÂÑ
ÄëÿÏÎÂÑíàÏËÈÑòðåáóåòñÿìóëüòèòðåäîâûéàëãîðèòì,âêîòîðîìðàçðåøåíûòîëü-
êî ëîêàëüíûå ÷òåíèÿ, îïåðàöèè ãëîáàëüíîé ñèíõðîíèçàöèè íå òðåáóþò áîëüøîãî êîëè÷å-
ñòâà ïåðåñûëîê è â ìàêñèìàëüíîé ñòåïåíè èñïîëüçóþòñÿ âîçìîæíîñòè ÏËÈÑ è ïîäñèñòå-
ìû ïàìÿòè. Ïðîåêò òàêîãî àãëîðèòìà ïðèâåäåí íà ðèñ. 1. Èçíà÷àëüíî, âåðøèíû â ãðàå
ðàçáèâàþòñÿ ìåæäó óçëàìè òàêèì îáðàçîì, ÷òî ðåáðà, ñîîòâåòñòâóþùèå ñïèñêó âåðøèí,
îáðàáàòûâàåìûõ íàäàííîìóçëå,íàõîäÿòñÿâëîêàëüíîéïàìÿòèñîîòâåòñòâóþùåãîÏËÈÑ.
 ïàìÿòè êàæäîãî óçëà òàêæå õðàíèòñÿ òàêæå ëîêàëüíûé ó÷àñòîê ðîíòà.  êà÷åñòâå
îðìàòàõðàíåíèÿãðààèñïîëüçóåòñÿCompressedSparseRow(CSR) îðìàò.Êàæäûéëî-
êàëüíûé ðîíò íà îïðåäåëåííîìóðîâíå ïîèñêàîáðàáàòûâàåòñÿíåçàâèñèìî,ïðè÷åì, åñëè
â ïðîöåññå ïîèñêà îáðàáàòûâàåìîå ðåáðî ñâÿçûâàåò ëîêàëüíóþ âåðøèíó ñ âåðøèíîé, äàí-
íûå î êîòîðîé õðàíÿòñÿ â óäàëåííîé ïàìÿòè, èíîðìàöèÿ î äàííîé âåðøèíå ïîñûëàåòñÿ
íà óäàëåííûé âû÷èñëèòåëüíûé óçåë, ãäå è ïðîèñõîäèò åå ïîñëåäóþùàÿ îáðàáîòêà. àçðå-
øåíèå êîíëèêòîâ ìåæäó ïîòîêàìè âíóòðèóçëà âû÷èñëèòåëÿîñóùåñòâëÿåòñÿñ ïîìîùüþ
àïïàðàòíîðåàëèçîâàííûõíàóðîâíåêîíòðîëëåðàïàìÿòèàòîìàðíûõîïåðàöèéèfull/empty
ïðèçíàêîâ ÿ÷ååê äàííûõ.
5. Îöåíêà ïðîèçâîäèòåëüíîñòè ÏÎÂÑ íà ÏËÈÑ
5.1. Îöåíêà ïðîèçâîäèòåëüíîñòè óçëà
ÏðåäëàãàåìûéÏÎÂÑñîñòîèòèç32âû÷èñëèòåëüíûõóçëîâ,ñîåäèíåííûõêîììóíèêàöè-
îííîéïîäñèñòåìîéèçìóëüòèãèãàáèòíûõòðàíñèâåðîâðàáîòàþùèõïîïðîòîêîëóPCIeGen3
4x. Êàæäûé âû÷èñëèòåëüíûéóçåëïðåäñòàâëÿåòèç ñåáÿêðèñòàëë ÏËÈÑKintex Ultrasale
XCKU095åìêîñòüþ940 òûñ.LUT,ðàáîòàþùèéíà÷àñòîòå660Ìö,è÷åòûðåêîíòðîëëåðà
âíåøíåé ïàìÿòè RLDRAMIII, ðàáîòàþùèé íà ÷àñòîòå 800 Ìö,åìêîñòüþ 64 Ìáàéò êàæ-
äûé. Îöåíêó ïðîèçâîäèòåëüíîñòè äàííîãî âû÷èñëèòåëüíîãî óçëà áóäåì ïðîâîäèòü ïóòåì
ñðàâíåíèÿ ññóùåñòâóþùèìèâû÷èñëèòåëüíûìèñèñòåìàìè íà ÏËÈÑîò êîìïàíèè Convey,
ïðîèçâîäèòåëüíîñòüêîòîðûõèçâåñòíà[4℄.Ñèñòåìà ConveyMX-100ñîñòîèò èç÷åòûðåõâû-
÷èñëèòåëüíûõ êðèñòàëëîâV6HX565Tåìêîñòüþ585òûñ.LUT,ðàáîòàþùèéíà÷àñòîòå550
Ìö èïîäñèñòåìûïàìÿòèèç 32 êàíàëîâDDR3.
Âðàáîòå[9℄áûëîïîêàçàíî,÷òîèñïîëüçîâàíèåàëãîðèòìàîïòèìèçàöèèïîíàïðàâëåíè-
ÿì ïîçâîëÿåò ñíèçèòüêîëè÷åñòâî îáðàáàòûâàåìûõ ðåáåðãðàà äî ðàçìåðà ìèíèìàëüíîãî
îñòîâíîãî äåðåâà,èëèâ16ðàçäëÿãðààïëîòíîñòüþ16 ðåáåðíàâåðøèíó.Âðàáîòå[4℄èñ-
ïîëüçóåòñÿñïîñîáõðàíåíèÿ,êîòîðûéïîçâîëÿåòóìåíüøèòüêîëè÷åñòâîçàïðîñîâíà÷òåíèå
äàííûõïðèîáðàáîòêåîäíîãîðåáðàäîòðåõ,÷òîïîçâîëÿåòîöåíèòüòðåáóåìóþïðîïóñêíóþ
ñïîñîáíîñòü ïàìÿòè â ñèñòåìå MX-100 ïî îðìóëå 14,6 [5℄ /16*3 = 2,7 GR/s (GR/s
10 9
÷òåíèéâñåêóíäó).Âñîîòâåòñòâèèñ ðàáîòîé[6℄ïðîèçâîäèòåëüíîñòü èñïîëüçóåìîéâñèñòå-
ìå Convey ïàìÿòè DDR3 íà ñëó÷àéíûõ ÷òåíèÿõ ìîæåò áûòü îöåíåíà â 0,6 GR/s äëÿ 32-õ
êàíàëîâ. àññìàòðèâàåìîåâ [10℄ ïåðåóïîðÿäî÷èâàíèåçàïðîñîâ ïðè äîñòóïå ê ïàìÿòè ïîç-
âîëÿåò ïîâûñèòü ïðîèçâîäèòåëüíîñòü ÷òåíèÿ â 4-5 ðàç îòíîñèòåëüíî ñêîðîñòè ñëó÷àéíîãî
÷òåíèÿ. Ïðîâåäåííàÿíàìèìîäåëèðîâàíèåðàáîòû êîíòðîëëåðàRLDRAMIII ïîêàçàëî,÷òî
ïðîèçâîäèòåëüíîñòü ïðåäëàãàåìîé ïîäñèñòåìû ïàìÿòè ñîñòàâëÿåò 140*16 = 2,24 GR/s íà
ñëó÷àéíûõ ÷òåíèÿõè750*16=11,5GR/s íàïîñëåäîâàòåëüíûõ÷òåíèÿõïðèðàçìåðåáëîêà
18áàéò,÷òîïîçâîëÿåòãîâîðèòüîñîçäàíèèïîäñèñòåìûïàìÿòèñïðîïóñêíîéñïîñîáíîñòüþ
äî 10 GR/s.
Òîãäà,îòòàëêèâàÿñüîò ïðîèçâîäèòåëüíîñòè ïîäñèñòåìûïàìÿòè,ïðåäëàãàåìûéÏÎÂÑ
ñìîæåò îáðàáàòûâàòü â 10/2,7 =3,7 áîëüøåðåáåðâ ñåêóíäó,÷åì MX-100. Ñëåäîâàòåëüíî,
ïðîèçâîäèòåëüíîñòü îäíîãî ÏËÈÑ, ïðåäïîëàãàÿ ëèíåéíóþ ìàñøòàáèðóåìîñòü âû÷èñëèòå-
ëÿ,ìîæíîîöåíèòüâ(14,6/4)*940òûñ.LUT*660Ìö/565òûñ.LUT*550Ìö=7,3GTEPS.
5.2. Âîçìîæíîñòè ìàñøòàáèðóåìîñòè ñèñòåìû
 ðàçäåëå 4 áûëî ïîêàçàíî, ÷òî â ñëó÷àå îáðàáîòêè ðåáðà, êîòîðîå ñîåäèíÿåò ëîêàëü-
íóþ âåðøèíó ñ âåðøèíîé, íàõîäÿùåéñÿ â óäàëåííîé ïàìÿòè, ïî êîììóòàöèîííîé øèíå
ïîñûëàåòñÿ çàïðîñ íà óäàëåííóþ îáðàáîòêó äàííîé âåðøèíû. Ýòîò çàïðîñ ïðåäïîëàãàåò
ïåðåäà÷ó8 áàéò ïîëåçíîé èíîðìàöèè íîìåð çàïðàøèâàåìîé âåðøèíû,íîìåð çàïðàøè-
âàþùåé âåðøèíû è åå óðîâåíü. Ïðè ðàâíîìåðíîì ðàñïðåäåëåíèè âåðøèí ìåæäó óçëàìè,
ó÷èòûâàÿ ìàêñèìàëüíî âîçìîæíóþ ïðîèçâîäèòåëüíîñòü îäíîãî óçëà, îöåíêà äëÿ êîòîðîé
äàíà â ïðåäûäóùåìðàçäåëå,íåîáõîäèìóþïðîïóñêíóþ ñïîñîáíîñòüìîæíî ðàññ÷èòàòü êàê
8 áàéò*(7,6GTEPS/16) = 3,8Áàéò/ñ äëÿ ñèñòåìû ñ äîñòàòî÷íîáîëüøèì êîë-âîìâû÷èñ-
ëèòåëüíûõóçëîâ.ÂïðåäëàãàåìîìÏÎÂÑäëÿñîåäèíåíèÿâû÷èñëèòåëåéèñïîëüçóåòñÿñåòü,
ïîñòðîåííàÿíà êîììóòàòîðàõ PCIeGen34x,ïðîïóñêíàÿñïîñîáíîñòüêîòîðîéíàçàïèñüèç
îäíîãîâû÷èñëèòåëÿâäðóãîéñîñòàâëÿåò4Áàéò/.Îäíàêî,èçâåñòíî[11℄,÷òîïðîïóñêíàÿ
ñïîñîáíîñòü PCIe ïðè ïåðåäà÷å ñîîáùåíèé âåëè÷èíîé 8 áàéò ñîñòàâëÿåò ïðèáëèçèòåëüíî
30%îò ìàêñèìàëüíîé,áîëåå90%ïðè äëèíå ñîîáùåíèÿâ100 èáîëååáàéò.Ýòî çíà÷èò,÷òî
äëÿ ïîëíîöåííîé çàãðóçêè ÏËÈÑ ïîòðåáóåòñÿ ëèáî ïåðåéòè ê øèíå PCIe áîëüøåé øèðè-
íû,ëèáîèñïîëüçîâàòüìåõàíèçìàãðåãàöèèñîîáùåíèéóäàëåííîéçàïèñè.Ýòèîïòèìèçàöèè
ïîçâîëÿò äëÿ ñèñòåìû èç 4 óçëîâ äîñòè÷ü ïðàêòè÷åñêè ëèíåéíîé ìàñøòàáèðóåìîñòè. Îä-
íàêî â ðàññìàòðèâàåìîì ïðîòîòèïå 4-õ óçëîâûå áëîêè îáúåäèíåíûPCIe Gen3 8x, ïðîâåäÿ
àíàëîãè÷íûå âûêëàäêè,ïðîèçâîäèòåëüíîñòüÏÎÂÑ ñ 32óçëàìè ìîæåò áûòü îöåíåíàâ 7,6
GTEPS * 32 óçëà*(8áàéò//4)/3,8 áàéò/= 128 GTEPSèëè 200 MTEPS/W (îöåíèâàÿ
ýíåðãîïîòðåáëåíèå â20 Âàòò íàÏËÈÑ).
5.3. Ñðàâíåíèå ñ ñóùåñòâóþùèìè óñòðîéñòâàìè
àññìàòðèâàåìàÿ ÏÎÂÑ ïî ðàçìåðó ðåøàåìîé çàäà÷è BFS äîëæíà áûòü ïî îòíåñåíà
ïî êëàññèèêàöèèGreenGraph500 êðàçäåëó Smalldata, êîòîðûé âðåäàêöèèîò èþëÿ2015
ïðåäñòàâëåíâ ïåðâîéäåñÿòêå 4-ìÿñèñòåìàìèîðèãèíàëüíîéàðõèòåêòóðûè6-þñèñòåìàìè
íà áàçåìèêðîïðîöåññîðîâ äëÿ ñîòîâûõ òåëåîíîâ è ïëàíøåòîâ. Âòîðîé è òðåòèéäåñÿòîê
ëèäåðîâGreenGraph500ïðàêòè÷åñêèïîëíîñòüþçàíÿòûSMPñèñòåìàìèíàîäíîì,äâóõèëè
÷åòûðåõ òîïîâûõ x86 ïðîöåññîðàõ Intel Sandybridge.Ïðàêòè÷åñêè âñå ëèäåðû èñïîëüçóþò
ïðåäåëüíîîïòèìèçèðîâàííûåàëãîðèòìûîòãðóïïûGraphCREST[9℄.Ïîïàäàíèåâäåñÿòêó
Òàáëèöà1. ÑðàâíåíèåðàññìàòðèâàåìîãîÏÎÂÑññóùåñòâóþùèìèðåøåíèÿìè.
Ïàðàìåòðû Convey
MX-
100 [12℄
Xperia Z1
[13℄
Fermi
GPU
[14℄
Cray XE6
Hopper[15℄
Intel SB
EP [13℄
ÏÎÂÑ
32óçëà
Âîçìîæíûé
ðàçìåðãðàà
29 20 20 31 28 22
åçóëüòàò
GTEPS
14,6 1,03 0,63 62 28,61 128
åçóëüòàò
MTEPS/W
146 235 2,6 0,15 61,48 200
Green
Graph500
Small or
Big Data
Category
8SD 2 SD 29SD 19 BD 1 BD 5 SD
Graph500 79 153 171 54 70 46
òðåáóåò ýíåðãîýåêòèâíîñòèíà óðîâíå 130 MTEPS/W, êîòîðàÿ,êàê ïîêàçàíî âûøå,ìî-
æåò áûòüäîñòèãíóòàíà ÏÎÂÑ â ðàññìàòðèâàåìîé â íàñòîÿùåé ñòàòüåêîíèãóðàöèèïðè
ýåêòèâíîñòèðåàëèçàöèè àãëîðèòìà BFSíà ÏËÈÑÏÎÂÑíàóðîâíå ñèñòåìConvey.
Îòìåòèì, ÷òîïðàêòè÷åñêè âñåïðåäñòàâëåííûåâðàçäåëåSmalldataâû÷èñëèòåëèèìå-
þòîäèíóçåëèíåäîïóñêàþòìàñøòàáèðîâàíèÿ,ò.ê.ëèáîïðèíöèïèàëüíîîäíîïðîöåññîðíûå
(ñèñòåìûíà Snapdragonè ïîäîáíûå), ëèáîèñïîëüçóþòíå ìàñøòàáèðóåìûå ðåøåíèÿ(SMP
ñèñòåìà íà4-õ Intel Sandybridge). ïðîòèâîïîëîæíîñòü èì ïðåäëîæåííàÿ ÏÎÂÑìíîãî-
óçëîâàÿèóæåñîäåðæèò32âû÷èñëèòåëüíûõóçëà.ÅåîòíåñåíèåêSmalldatañâÿçàíîòîëüêî
ñ îñîáåííîñòüþ èñïîëüçóåìîé RLDRAMIII (ìàëàÿ åìêîñòü ìîäóëÿ). Â êëàññå Big data ïî-
ïàäàíèåâïåðâóþ äåñÿòêóòðåáóåò ýíåðãîýåêòèâíîñòèíà óðîâíå20MTEPS/W, êîòîðàÿ
î÷åâèäíî áóäåò äîñòèãíóòà ïðèïåðåõîäåíà áîëåååìêèå ìîäóëè ïàìÿòèòèïà DDR3/4.
6. Çàêëþ÷åíèå
Øèðîêèéêëàññçàäà÷,òðåáóþùèõíåðåãóëÿðíîéðàáîòû ñáîëüøèìèèñâåðõáîëüøèìè
îáúåìàìèäàííûõ,âòîì÷èñëåçàäà÷àïîèñêàâøèðüïîãðàó,ìîæåòýåêòèâíîðåøàòüñÿ
íà ìàññèâå ÏËÈÑ, îñíàùåííûõ êîíòðîëëåðàìè ïàìÿòè è ñâÿçàííûìè êîììóíèêàöèîííîé
øèíîéPCIe.Ïðèñîâìåñòíîéîïòèìèçàöèèàëãîðèòìàïîèñêà,êîëè÷åñòâàèòèïàèñïîëüçóå-
ìûõêîíòðîëëåðîâïàìÿòèèïàðàìåòðîâêîììóíèêàöèîííîéøèíû,ìîãóòáûòüðàçðàáîòàíû
ñèñòåìû ñ ðåêîðäíûìè ïîêàçàòåëÿìèâ òåñòåGreenGraph500.
Ëèòåðàòóðà
1. Franesquini,Emilioand Castroetal. //Ontheenergyeieny andperformaneof
irregularappliationexeutions onmultiore, NUMAand manyoreplatforms, Journalof
Parallel andDistributed Computing,2014, Elsevier.
2. Franiso,Philand others //TheNetezzadataappliane arhiteture:a platformfor high
performanedatawarehousingand analytis, IBMRedbooks,2011.
3. Agarwal, Viratand Petrini, Fabrizio andPasetto,Davideand Bader,David A //Salable
graphexploration onmultioreproessors,Proeedings ofthe2010ACM/IEEE
InternationalConferene forHigh Performane Computing,Networking, Storageand
Analysis, P.111,2010, IEEEComputer Soiety
4. Attia,Osama GandJohnson, Tylerand Townsend,Kevin andJones,Philipand
Zambreno, Joseph //CyGraph: AReongurable Arhiteture for Parallel Breadth-First
Searh,Parallel &Distributed ProessingSymposiumWorkshops (IPDPSW), 2014IEEE
International,P.228235,2014,IEEE.
5. Graph500List July 2015, URL:http://www.graph500.org/results_jul_2015
6. Avnet//Optimal MemoryInterfae Designwith Xilinx7 SeriesXfest-2012 presentation,
2012, URL:http://www.em.avnet.om/en-us/design/trainingandevents/Douments/X-
FEST%202012%20PRESENTATIONS/xfest12_pdf_memory_v1_2_may15.pdf
7. Cheoni,Fabioand Petrini, Fabrizio //TraversingTrillionsof EdgesinReal Time:Graph
Exploration onLarge-SaleParallel Mahines, Parallel and DistributedProessing
Symposium, 2014IEEE28thInternational,P.425434,2014,IEEE.
8. TheodoreMarkettos,A andFox, PaulJ and Moore, Simon WandMoore, AndrewW
//Interonnet forommodityFPGAlusters:standardizedor ustomized?, Field
Programmable Logiand Appliations (FPL),201424thInternationalConfereneon,
P.18,2014, IEEE.
9. Yasui,Yuihiro andFujisawa,KatsukiandGoto, Keisuke//NUMA-optimized parallel
breadth-rstsearh onmultiore single-node system,Big Data,2013IEEEInternational
Conferene on,P.394402,2013,IEEE.
10. Jin, Zheming and Bakos, JasonD //Memory AessShedulingon theConveyHC-1,2013
IEEE 21stAnnualInternationalSymposium onField-ProgrammableCustom Computing
Mahines
11. UnderstandingPerformaneof PCIExpress SystemsXilinxWhite paperOtober2014,
URL: http://www.xilinx.om/support/doumentation/white_papers/wp350.pdf.
12. Convey//ConveyMX SeriesArhiteturalOverview, Whitepaper,
URL: http://www.onveyomputer.om/les/5913/5266/3278/CONV-12-
036.1MXarhOvrvwWeb.pdf
13. Yasui,Yuihiro andFujisawa,KatsukiandSato, Yukinori,//Fastandenergy-eient
breadth-rstsearh ona singlenumasystem,Superomputing, P.365381,2014, Springer.
14. Hong, Sungpakand Oguntebi,Tayo andOlukotun,Kunle, //Eient parallel graph
exploration onmulti-ore CPUand GPU,Parallel Arhitetures andCompilation
Tehniques(PACT), 2011InternationalConferene on,P.7888,2011, IEEE.
15. Beamer,Sott andBulu, Aydin and Asanovi, Krsteand Patterson, Dean, //Distributed
memory breadth-rstsearhrevisited:Enablingbottom-upsearh,Parallel andDistributed
Proessing SymposiumWorkshops&PhDForum(IPDPSW), 2013IEEE27th
International,P.16181627,2013, IEEE.