Ìàòåìàòè÷åñêîå ìîäåëèðîâàíèå ñòîëêíîâåíèÿ
ãàëàêòèê íà ãèáðèäíûõ ñóïåðÝÂÌ ñ óñêîðèòåëÿìè
Intel Xeon Phi
∗
È.Ì. Êóëèêîâ 1,2,3
, È..×åðíûõ 1,2
, Â.Å. Íåíàøåâ 3
, Å.Â. Êàòûøåâà 3
Èíñòèòóòâû÷èñëèòåëüíîé ìàòåìàòèêèè ìàòåìàòè÷åñêîéãåîèçèêè
Ñèáèðñêîãî îòäåëåíèÿ ÀÍ 1
,
Íîâîñèáèðñêèé ãîñóäàðñòâåííûé óíèâåðñèòåò 2
,
Íîâîñèáèðñêèé ãîñóäàðñòâåííûé òåõíè÷åñêèé óíèâåðñèòåò 3
Âñòàòüåïðåäñòàâëåíûðåçóëüòàòûñóïåðêîìïüþòåðíîãîìîäåëèðîâàíèÿïðîöåññà
ñòîëêíîâåíèÿ ãàëàêòèê íà ãèáðèäíîì ñóïåðêîìïüþòåðå RSC PetaStream, îñíà-
ùåííîìóñêîðèòåëÿìèIntelXeonPhi.Äëÿ îïèñàíèÿâçàèìîäåéñòâóþùèõãàëàê-
òèêèñïîëüçóåòñÿìàòåìàòè÷åñêàÿìîäåëü,îñíîâàííàÿíàóðàâíåíèÿõãðàâèòàöè-
îííîéãàçîâîéäèíàìèêè äëÿîïèñàíèÿãàçîâîéêîìïîíåíòûãàëàêòèêè íàóðàâ-
íåíèÿõ äëÿ ïåðâûõ ìîìåíòîâ áåññòîëêíîâèòåëüíîãî óðàâíåíèÿ Áîëüöìàíà äëÿ
îïèñàíèÿ çâåçäíîéêîìïîíåíòûè òåìíîé ìàòåðèè. Ìîäåëü âçàèìîäåéñòâóþùèõ
ãàëàêòèêäîïîëíåíàïîäñåòî÷íûìè èçè÷åñêìèïðîöåññàìè(âçàðóáåæíîéëè-
òåðàòóðå èñïîëüçóåòñÿ òåðìèí subgrid-physis) ïðîöåññîì çâåçäîîáðàçîâàíèÿ,
ýåêòîìîòâçðûâàìîëîäûõñâåðõíîâûõ,ñàìîñîãëàñîâàííûìîáðàçîâàíèåììî-
ëåêóëÿðíîãî âîäîðîäà è ïðîöåññîì îõëàæäåíèÿ/íàãðåâàíèÿ. Èñïîëüçîâàíèå òà-
êîéìîäåëèïîçâîëèëî ñîðìóëèðîâàòü åäèíûéïàðàëëåëüíûéâû÷èñëèòåëüíûé
ìåòîä äëÿ ðåøåíèÿ èñïîëüçóåìûõ ãèïåðáîëè÷åñêèõ óðàâíåíèé. ×èñëåííàÿ ìî-
äåëü áûëà ðåàëèçîâàíà â âèäå ïðîãðàììíîãî êîìïëåêñà äëÿ ãèáðèäíûõ ñóïåð-
ÝÂÌ, îñíàùåííûõ óñêîðèòåëÿìè Intel Xeon Phi. Â ðàìêàõ îäíîãî óñêîðèòåëÿ
áûëîäîñòèãíóòî134-êðàòíîåóñêîðåíèåè92-ïðîöåíòíàÿýåêòèâíîñòüïðèèñ-
ïîëüçîâàíèè 64óñêîðèòåëåé IntelXeonPhi.
1. Ââåäåíèå
Ñòîëêíîâåíèå ãàëàêòèê èãðàåò êëþ÷åâóþ ðîëü â èõ ýâîëþöèè [1℄. Íàáëþäàòåëüíîå è
òåîðåòè÷åñêîå èçó÷åíèå âçàèìîäåéñòâóþùèõ ãàëàêòèê íåçàìåíèìûé ìåòîä èññëåäîâàíèÿ
èõ ñâîéñòâ è ýâîëþöèè. Ìàòåìàòè÷åñêîå ìîäåëèðîâàíèå èãðàåò áîëåå ÷åì âàæíóþ ðîëü
â òåîðåòè÷åñêîì èññëåäîâàíèè òàêèõ ïðîöåññîâ.  ïîñëåäíåå âðåìÿ âñå áîëåå àêòóàëüíûì
ñòàíîâèòñÿó÷åòïîäñåòî÷íûõ èçè÷åñêèõïðîöåññîââîâçàèìîäåéñòâóþùèõãàëàêòèêàõ
ïðîöåññîâ õèìîêèíåòèêè [2℄, çâåçäîîáðàçîâàíèÿ [3℄,ýåêòîâ îò âçðûâà ñâåðõíîâûõ çâåçä
[4℄,èñâÿçàííûõñíèìèîõëàæäåíèÿ/íàãðåâàíèÿ,êîòîðûåîêàçûâàþòçíà÷èòåëüíîåâëèÿíèå
íà äèíàìèêóãàëàêòèê.
Îäíîé èçãëàâíûõ ïðîáëåì ìîäåëèðîâàíèÿâçàèìîäåéñòâóþùèõãàëàêòèêÿâëÿåòñÿ ñî-
îòíîøåíèåìàñøòàáîâ.Òàêìàññàîäíîéãàëàêòèêèñîñòàâëÿåò
10 13 M ⊙
èðàçìåð10
êèëîïàð- ñåê, à ìàññàðÿäîâîé çâåçäûâ ãàëàêòèêå ïîðÿäêàM ⊙
è ðàçìåð10 −7
ïàðñåê, ÷òî ïðèâîäèòê ðàçðûâó â 13 ïîðÿäêîâ äëÿ ìàññûè 14 ïîðÿäêîâ äëÿ ðàçìåðà ìåæäó àñòðîèçè÷åñêèìè
îáúåêòàìè. Äàííîå îáñòîÿòåëüñòâî ïðèâîäèò ê íåîáõîäèìîñòè èñïîëüçîâàíèÿ ìîùíåéøèõ
èç äîñòóïíûõñóïåðÝÂÌ. Íà ñåãîäíÿøíèé äåíü ñàìûå ïðîèçâîäèòåëüíûå ñóïåðêîìïüþòå-
ðûïîñòðîåíûíàãèáðèäíîéàðõèòåêòóðåíàîñíîâåãðàè÷åñêèõóñêîðèòåëåéèóñêîðèòåëåé
Intel XeonPhi. Òàê â èþëüñêîé âåðñèè2015 ãîäà ñïèñêà Top500ïåðâûå äâà ñóïåðêîìïüþ-
òåðà(è ÷åòûðåèç Top10)ïîñòðîåíûíà ýòèõòåõíîëîãèÿõ.Î÷åâèäíî,÷òî íàîñíîâåèìåííî
ãèáðèäíûõ àðõèòåêòóð áóäåò ïîñòðîåí ïåðâûéýêçàëîïñíûéñóïåðêîìïüþòåð.
Îñîáåííîñòè àðõèòåêòóðû, ñåòåâîé èíðàñòðóêòóðû, îðãàíèçàöèè ïàìÿòè è âû÷èñëå-
∗
àáîòà ïîääåðæàíàãðàíòàìè îññèéñêîãî îíäà óíäàìåíòàëüíûõ èññëåäîâàíèé 15-31-20150 ìîë-à-
âåä,15-01-00508è14-01-31199,ãðàíòîìÏðåçèäåíòàÔMK6648.2015.9.
íèé äåëàþòðàçðàáîòêóïðîãðàììíîãîîáåñïå÷åíèÿ äëÿòàêèõñóïåðÝÂÌ äîñòàòî÷íîñëîæ-
íîé íàó÷íîé çàäà÷åé, êîòîðàÿ òðåáóåò äåòàëüíîé ïðîðàáîòêè îò èçè÷åñêîé ïîñòàíîâêè
çàäà÷è äî èíñòðóìåíòîâ ðàçðàáîòêè.  ýòîì ñëó÷àå ìû ãîâîðèì î êîíöåïöèè ñî-äèçàéíà
âûñîêîïðîèçâîäèòåëüíûõ âû÷èñëåíèé [5℄.  ðàáîòå îïèñàíà ìàòåìàòè÷åñêàÿ ìîäåëü âçàè-
ìîäåéñòâóþùèõãàëàêòèêñó÷åòîìðàçëè÷íûõïîäñåòî÷íûõ èçè÷åñêèõïðîöåññîâ è÷èñ-
ëåííûéìåòîä,èñïîëüçóåìûéäëÿååðåøåíèÿ.Ïðèâåäåíûñõåìàèàíàëèçïðîèçâîäèòåëüíî-
ñòè ïðîãðàììíîé ðåàëèçàöèè, à òàêæå ïðîäåìîíñòðèðîâàíû ðåçóëüòàòû âû÷èñëèòåëüíûõ
ýêñïåðèìåíòîâ ïîìîäåëèðîâàíèþñòîëêíîâåíèÿ ãàëàêòèê.
2. Ìàòåìàòè÷åñêàÿ ìîäåëü âçàèìîäåéñòâóþùèõ ãàëàêòèê
 ýòîì ðàçäåëå ìû ñîðìóëèðóåì îñíîâíûå ïðèíöèïû ñî-äèçàéíà, èñïîëüçóåìûå ïðè
ïîñòðîåíèè ìàòåìàòè÷åñêîé ìîäåëè âçàèìîäåéñòâóþùèõ ãàëàêòèê, à òàêæå îïèøåì ïîëó-
÷åííóþ ìàòåìàòè÷åñêóþìîäåëü.
2.1. Ñî-äèçàéí âû÷èñëèòåëüíîéìîäåëèäëÿîïèñàíèÿâçàèìîäåéñòâóþùèõ
ãàëàêòèê
Âðàáîòå[5℄áûëèñîðìóëèðîâàíûøåñòüîñíîâíûõñòàäèéñî-äèçàéíàâû÷èñëèòåëüíûõ
ìîäåëåéèçè÷åñêèõïðîöåññîâäëÿñóïåðÝÂÌ:èçè÷åñêàÿìîäåëü
→
ìàòåìàòè÷åñêàÿîð- ìàëèçàöèÿ→
÷èñëåííûéìåòîä→
ñòðóêòóðûäàííûõ→
àðõèòåêòóðàñóïåðÝÂÌ→
ñðåäñòâàïàðàëëåëüíîãî ïðîãðàììèðîâàíèÿ.Âñàìîìîáùåìñìûñëå îñíîâíàÿñòðàòåãèÿïðèïîñòðî-
åíèèâû÷èñëèòåëüíîéìîäåëèýòîìàêñèìèçàöèÿíåçàâèñèìûõâû÷èñëåíèé.Ñó÷¼òîìòîãî,
÷òî óðàâíåíèÿ ãðàâèòàöèîííîéãàçîâîé äèíàìèêè ðàíååíàìè óñïåøíîðåøàëèñü ñ èñïîëü-
çîâàíèåìñåòî÷íûõìåòîäîâ[1, 7℄,òîìûèâäàëüíåéøåìáóäåìèñïîëüçîâàòüýòîòïîäõîäê
ìîäåëèðîâàíèþãàçîâîéêîìïîíåíòû.Òîãäàñòðàòåãèþ ïîñòðîåíèÿâû÷èñëèòåëüíîéìîäåëè
ìû ìîæåì ñîðìóëèðîâàòüáîëåå òî÷íî ìàêñèìèçàöèÿ íåçàâèñèìûõ îäíîðîäíûõ âû÷èñ-
ëåíèé â ÿ÷åéêàõ ðåãóëÿðíîé ñåòêè, ðàçáèâàþùóþðàñ÷åòíóþ îáëàñòü.Ôàêòè÷åñêèïðè èñ-
ïîëüçîâàíèèðåãóëÿðíîé(íàñàìîì äåëåèðàâíîìåðíîé)ðàñ÷åòíîé ñåòêèìûîðìóëèðóåì
ñòðóêòóðûäàííûõ. Âýòîì ñëó÷àåìûðåøàåìñðàçó äâå ïðîáëåìû:
1. Çà ñ÷¼ò èñïîëüçîâàíèÿ ðåãóëÿðíîé ñòðóêòóðû äàííûõ (â íàøåì ñëó÷àå òðåõìåðíûé
ìàññèâ) ìû ìîæåì äîñòàòî÷íî ëåãêî èñïîëüçîâàòüëþáóþ âû÷èñëèòåëüíóþàðõèòåê-
òóðó ñóïåðÝÂÌ, êîòîðàÿ îñíîâàíà íà äåêàðòîâîé òîïîëîãèè.  îñíîâå ãðàè÷åñêèõ
óñêîðèòåëåé, óñêîðèòåëåé Intel XeonPhi, áîëüøèíñòâà êëàññè÷åñêèõ ñóïåðêîìïüþòå-
ðîâ èñïîëüçóåòñÿèìåííî òàêàÿòîïîëîãèÿ.
2. Âñëó÷àå ðåãóëÿðíîñòèè îäíîðîäíîñòèâû÷èñëåíèé ìûìèíèìèçèðóåì ÷èñëîèñïîëü-
çóåìûõ ïðîãðàììíûõñðåäñòâ,ïðåäîñòàâëÿåìûõñðåäñòâàìè ïàðàëëåëüíîãî ïðîãðàì-
ìèðîâàíèÿ.  ýòîì ñëó÷àå íàì äîñòàòî÷íî èíñòðóìåíòà ðàñïðåäåëåíèÿ çàäà÷ (è ïðè
íåîáõîäèìîñòè äàííûõ, êàê â ñëó÷àå ãðàè÷åñêèõ óñêîðèòåëåé è óñêîðèòåëåé Intel
XeonPhiâðåæèìåooad)ìåæäóíèòÿìèîäíîãîïðîöåññàèïðîñòåéøåãîèíñòðóìåí-
òàîáìåíàäàííûìè ìåæäóïðîöåññàìè.
Îäíàêî, â ýòîì ñëó÷àå ïîÿâëÿåòñÿ íåòðèâèàëüíàÿ çàäà÷à ïîñòðîåíèÿ ÷èñëåííîãî ìåòîäà,
êîòîðûé íå áóäåò äàâàòü àðòåàêòîâ, ñâÿçàííûõ ñ îñÿìè êîîðäèíàò, à òàêæå çàäà÷à ïî-
ñòðîåíèÿìàòåìàòè÷åñêîé ìîäåëè äëÿîïèñàíèÿáåññòîëêíîâèòåëüíîé êîìïîíåíòû.
Îñíîâíûìè èíãðåäèåíòàìè ãàëàêòèê ÿâëÿþòñÿ ñòîëêíîâèòåëüíàÿ êîìïîíåíòà, ìîäå-
ëèðóþùàÿ ãàçîâóþ ñîñòàâëÿþùóþ ãàëàêòèê è ðàâíîìåðíî ðàñïðåäåëåííóþ â íåé ïûëü,
è áåññòîëêíîâèòåëüíàÿ êîìïîíåíòà, ìîäåëèðóþùàÿ çâåçäíóþ êîìïîíåíòó è òåìíóþ ìàòå-
ðèþ. Äëÿ îïèñàíèÿ áåññòîëêíîâèòåëüíîé êîìïîíåíòû áóäåì èñïîëüçîâàòü óðàâíåíèÿ äëÿ
ïåðâûõìîìåíòîâáåññòîëêíîâèòåëüíîãîóðàâíåíèÿ Áîëüöìàíà.Òàêîéïîäõîäáûë óñïåøíî
îïðîáèðîâàí íà ìîäåëèðîâàíèè ýâîëþöèè[6℄èñòîëêíîâåíèè ãàëàêòèê[7℄, àòàêæå îí ïîç-
âîëÿåò òåðìîäèíàìè÷åñêè ñîãëàñîâàííî îïèñàòü ïðîöåññû àçîâîãî ïåðåõîäà ìåæäó êîì-
ïîíåíòàìè, êîòîðûé ïðîèñõîäèò ïðè çâåçäîîáðàçîâàíèè è âçðûâå ñâåðõíîâûõ. Â ñâÿçè ñ
ýòèì, ìîäåëü âçàèìîäåéñòâóþùèõ ãàëàêòèê îðìóëèðóåòñÿ â âèäå ñèñòåìû ãèïåðáîëè÷å-
ñêèõ óðàâíåíèé êàê äëÿ îïèñàíèÿ ãàçîâîé êîìïîíåíòû, òàê è äëÿ îïèñàíèÿ áåññòîëêíî-
âèòåëüíîé êîìïîíåíòû. Òàêàÿ îðìóëèðîâêà ïîçâîëÿåò íàì èñïîëüçîâàòü ñîãëàñîâàííûé
ïîäõîäêàçîâîìóïåðåõîäó,íåîáõîäèìîãîäëÿìîäåëèðîâàíèÿïðîöåññàçâåçäîîáðàçîâàíèÿ
èâçðûâàñâåðõíîâûõçâåçä,åäèíûé÷èñëåííûéìåòîäâûñîêîãîïîðÿäêàòî÷íîñòè,ñïîñîáçà-
äàíèÿñòðóêòóðäàííûõèïàðàëëåëüíîãîàëãîðèòìà.Äëÿó÷åòàõèìè÷åñêèõðåàêöèéáóäåì
ðàññìàòðèâàòüóðàâíåíèÿìíîãîêîìïîíåíòíîéîäíîñêîðîñòíîéãàçîâîéäèíàìèêè.Ñó÷¼òîì
òîãî,÷òîâíàøåéìîäåëèìûèñïîëüçóåìòîëüêîïðîöåññîáðàçîâàíèÿ/ðàñïàäàìîëåêóëÿðíî-
ãî âîäîðîäà,òî ìûìîæåìïîñòðîèòüàíàëèòè÷åñêîåðåøåíèÿäëÿèçìåíåíèÿêîíöåíòðàöèè
âîäîðîäà âêàæäîéÿ÷åéêåðàñ÷åòíîé îáëàñòè.
2.2. Óðàâíåíèÿ ìíîãîêîìïîíåòíîé ìíîãîàçíîé ãèäðîäèíàìèêè
Äëÿîïèñàíèÿãàçîâîéêîìïîíåíòûáóäåìèñïîëüçîâàòüñèñòåìóóðàâíåíèéîäíîñêîðîñò-
íîé êîìïîíåíòíîé ãðàâèòàöèîííîéãàçîâîé äèíàìèêè, çàïèñàííóþ â ýéëåðîâûõ êîîðäèíà-
òàõ:
∂ρ
∂t + ∇ · (ρ~ u) = S − D ,
∂ρ H
∂t + ∇ · (ρ H ~ u) = − s H,H 2 + S ρ H
ρ − D H ρ H ρ ,
∂ρ H 2
∂t + ∇ · (ρ H 2 ~ u) = s H,H 2 + S ρ H 2
ρ − D ρ H 2 ρ ,
∂ρ~ u
∂t + ∇ · (ρ~ u~ u) = −∇ p − ρ ∇ (Φ) + ~v S − ~ u D ,
∂ρE
∂t + ∇ · (ρE~ u) = −∇ · (p~v) − (ρ ∇ (Φ), ~ u) − Λ + Γ − ε D ρ ,
∂ρε
∂t + ∇ · (ρǫ~ u) = − (γ − 1)ρǫ ∇ · ~ u − Λ + Γ − ε D ρ , ρE = 1
2 ρ~ u 2 + ρε, p = (γ − 1)ρε,
Äëÿîïèñàíèÿáåññòîëêíîâèòåëüíîéêîìïîíåíòûáóäåìèñïîëüçîâàòüñèñòåìóóðàâíåíèéäëÿ
ïåðâûõ ìîìåíòîâ áåññòîëêíîâèòåëüíîãî óðàâíåíèÿ Áîëüöìàíà, çàïèñàííóþ òàêæå â ýéëå-
ðîâûõêîîðäèíàòàõ:
∂n
∂t + ∇ · (n~v) = D − S ,
∂n~v
∂t + ∇ · (n~v~v) = −∇ Π − n ∇ (Φ) + ~ u D − ~v S ,
∂ρW
∂t + ∇ · (ρW ~v) = −∇ · (Π~v) − (n ∇ (Φ), ~v) − Γ + ε D ρ ,
∂Π ξξ
∂t + ∇ · (Π ξξ ~v) = − 2Π ∇ · ~ u − Γ + ε D 3ρ , ρW = 1
2 ρ~v 2 + Π xx + Π yy + Π zz
2 ,
Óðàâíåíèå Ïóàññîíàäëÿîáåèõ êîìïîíåíòçàïèñûâàåòñÿâ âèäå:
∆Φ = 4πG(ρ + n),
ãäå
p
äàâëåíèå ãàçà,ρ H
ïëîòíîñòü àòîìàðíîãî âîäîðîäà,ρ H 2
ïëîòíîñòü ìîëåêó-ëÿðíîãî âîäîðîäà,
s H,H 2
ñêîðîñòü îáðàçîâàíèÿ ìîëåêóëÿðíîãî âîäîðîäà èç àòîìàðíîãî,ρ = ρ H + ρ H 2
ïëîòíîñòü ñìåñèãàçà,n
ïëîòíîñòü áåññòîëêíîâèòåëüíîé êîìïîíåíòû,~ u
ñêîðîñòü ãàçîâîé êîìïîíåíòû,
~v
ñêîðîñòü áåññòîëêíîâèòåëüíîé êîìïîíåíòû,ρE
ïëîò-íîñòüïîëíîéìåõàíè÷åñêîéýíåðãèèãàçà,
ρW
ïëîòíîñòüïîëíîéìåõàíè÷åñêîéýíåðãèèáåñ- ñòîëêíîâèòåëüíîé êîìïîíåíòû,Φ
ãðàâèòàöèîííûé ïîòåíöèàë,ε
ïëîòíîñòü âíóòðåííåéýíåðãèèãàçà,
γ
ýåêòèâíûéïîêàçàòåëüàäèàáàòû,Π ξξ = (Π xx , Π yy , Π zz )
äèàãîíàëüíûé òåíçîð äèñïåðñèè ñêîðîñòåé áåññòîëêíîâèòåëüíîé êîìïîíåíòû,S
ñêîðîñòü îáðàçîâàíèÿ ñâåðõíîâûõçâåçä,D
ñêîðîñòüçâåçäîîáðàçîâàíèÿ,Λ
óíêöèÿîõëàæäåíèÿ,Γ
óíêöèÿíàãðåâàíèÿ îòâçðûâà ñâåðõíîâûõ çâåçä.
2.3. Ìîäåëü ïîäñåòî÷íîé èçèêè
Ìû îñòàíîâèìñÿ íà ñëåäóþùèõ, íà íàø âçãëÿä, îñíîâíûõ ïîäñåòî÷íûõ ïðîöåññàõ,
èãðàþùèõ îñíîâíóþðîëü ïðèñòîëêíîâåíèè ãàëàêòèê.
2.3.1. Îáðàçîâàíèå ìîëåêóëÿðíîãî âîäîðîäà
Ìîëåêóëû âîäîðîäà â ìåæãàëàêòè÷åñêîì ïðîñòðàíñòâå îðìèðóþòñÿ íà ïîâåðõíîñòè
÷àñòèöèäèññîöèèðóþòêîñìè÷åñêèìèçëó÷åíèåì.Ïðåäïîëàãàÿ,÷òîïëîòíîñòüãàçàïðîïîð-
öèîíàëüíàïëîòíîñòè÷àñòèöèç-çàõîðîøåãîïåðåìåøèâàíèÿ÷àñòèöèãàçàâíàøåéìîäåëè,
êîíöåíòðàöèÿìîëåêóëÿðíîãî âîäîðîäàîïðåäåëÿåòñÿñëåäóþùèì âûðàæåíèåì [8℄:
dn H 2
dt = R gr (T)n H (n H + 2n H 2 ) − (ξ H + ξ diss (N H 2 , A V )) n H 2 ,
ãäå
n H 2
èn H
êîíöåíòðàöèÿ ìîëåêóëÿðíîãî è àòîìàðíîãî âîäîðîäà,N H 2
ñòîëáöåâàÿïëîòíîñòü ìîëåêóëÿðíîãî âîäîðîäà.
2.3.2. Ïðîöåññ çâåçäîîáðàçîâàíèÿ è îáðàçîâàíèå ñâåðõíîâûõ çâåçä
Íåîáõîäèìîåóñëîâèå ïðîöåññàçâåçäîîáðàçîâàíèÿ ñîðìóëèðóåì ââèäå [9℄:
T < 10 4 K ▽ · ~ u < 0 ρ > 1.64 M ⊙
pc −3
òîãäà ñêîðîñòüçâåçäîîáðàçîâàíèÿìîæåò áûòüçàïèñàíà ââèäå:
D = dn
dt = C ρ
τ dyn = C ρ 3 / 2 s 32G
3π
ãäå
C = 0.034
êîýèöèåíò ýåêòèâíîñòè çâåçäîîáðàçîâàíèÿ. Ñêîðîñòü îáðàçîâàíèÿ ñâåðõíîâûõçâåçäçàïèñûâàåòñÿ ââèäå [10℄:S = ∆ SN = dρ
dt = β C n
τ dyn = β C n 3 / 2 s 32G
3π
ãäå
β = 0.1
êîýèöèåíò âçðûâà ìîëîäûõ çâåçä. Ïðè âçðûâå îäíîé çâåçäû ñîëíå÷íîéìàññû âûäåëÿåòñÿ ýíåðãèÿ
10 51
ýðã, â ýòîì ñëó÷àå óíêöèÿ íàãðåâàíèÿ çà ñ÷åò âçðûâàñâåðõíîâûõìîæåò áûòüçàïèñàíà ââèäå:
Γ = 10 51 M SN M ⊙
erg
ãäå
M SN
ìàññàñâåðõíîâûõ çâåçäâ ëîêàëüíîì îáúåìå.2.3.3. Ôóíêöèÿõ îõëàæäåíèÿ
àëàêòè÷åñêèé ãàç, ðàçîãðåòûé çà ñ÷åò ñòîëêíîâåíèÿ äî òåìïåðàòóðû
∼ 10 4 − 10 8 K
îïèñûâàåòñÿóíêöèåéîõëàæäåíèÿ[11℄:
Λ ≃ 10 − 22 n 2 H cm − 3 erg
ãäå
n H
êîíöåíòðàöèÿàòîìàðíîãî âîäîðîäà.3. ×èñëåííûé ìåòîä ðåøåíèÿ
Äëÿ ðåøåíèÿ ãèäðîäèíàìè÷åñêèõ óðàâíåíèé áûë èñïîëüçîâàí îðèãèíàëüíûé ÷èñëåí-
íûé ìåòîä,îñíîâàííûé íà êîìáèíàöèè operatorsplitting ïîäõîäà, ìåòîäà îäóíîâà, ñõåìû
îåèêóñî÷íî-ïàðàáîëè÷åñêîéðåêîíñòðóêöèèíàëîêàëüíîìøàáëîíå. Òàêîéìåòîäîáúåäè-
íÿåò âñå äîñòîèíñòâà ïåðå÷èñëåííûõ ìåòîäîâ è îáëàäàåò âûñîêîé ñòåïåíüþ ïàðàëëåëèçà-
öèè,íà÷åìîñòàíîâèìñÿïîäðîáíååâñëåäóþùåìðàçäåëå.ÄëÿðåøåíèÿóðàâíåíèÿÏóàññîíà
èñïîëüçóåòñÿ ìåòîä,îñíîâàííûéíàáûñòðîì ïðåîáðàçîâàíèèÔóðüå.
3.1. Ýéëåðîâ ýòàï ðåøåíèÿ ãèäðîäèíàìè÷åñêèõ óðàâíåíèé
Íà ýéëåðîâîì ýòàïå ÷èñëåííîãî ìåòîäà ðåøàþòñÿ óðàâíåíèÿ áåç ó÷åòà àäâåêòèâíûõ
÷ëåíîâèóíêöèé,ìîäåëèðóþùèõïîäñåòî÷íûå ïðîöåññû.Äëÿàïïðîêñèìàöèèïðîñòðàí-
ñòâåííûõïðîèçâîäíûõèñïîëüçóåòñÿðåøåíèåçàäà÷èëèíåàðèçîâàííîéçàäà÷èèìàíà.Äëÿ
ýòîãîíàâñåõèíòåðåéñàõìåæäóÿ÷åéêàìè(Lîáîçíà÷åíèåëåâîéÿ÷åéêè,Rîáîçíà÷åíèå
ïðàâîéÿ÷åéêè)ïðîèñõîäèò îñðåäíåíèåñêîðîñòèè äàâëåíèÿïîîðìóëàì:
ρ = ρ 3 L / 2 + ρ 3 R / 2
√ ρ L + √ ρ R n = n 3 L / 2 + n 3 R / 2
√ n L + √ n R p = p L √ ρ L + p R √ ρ R
√ ρ L + √ ρ R Π = Π L √ n L + Π R √ n R
√ n L + √ n R
Òàêîé ñïîñîá îñðåäíåíèÿ ñâÿçàí ñ íåîáõîäèìîñòüþ áîëåå àêêóðàòíîãî ó÷åòà ãðàíèöû ãàç-
âàêóóì,÷åì êàê ýòîñäåëàíîâ îðèãèíàëüíîéñõåìå îå [12℄.Ïðèýòîì áóäåìïðåäïîëàãàòü,
÷òîâðàññìàòðèâàåìûõÿ÷åéêàõðåøåíèåÿâëÿåòñÿêóñî÷íî-ïàðàáîëè÷åñêîéóíêöèåé.Ïðî-
öåäóðà ïîñòðîåíèÿ ëîêàëüíûõ ïàðàáîë ïîäðîáíî îïèñàíà â êëàññè÷åñêîé ðàáîòå [13℄. Ýòà
ïðîöåäóðàíàìïîíàäîáèòñÿòàêæåâñëåäóþùåìýòàïå ìåòîäà.Îïóñòèâïîäðîáíûåâûêëàä-
êè äëÿ âûâîäà ñõåìû ñ ïîìîùüþ ìåòîäà îäóíîâà (÷èñëåííûé ìåòîä è åãî âåðèèêàöèÿ
ïîäðîáíîîïèñàíûâðàáîòå[14℄)ðåøåíèåçàäà÷èèìàíàäëÿýéëåðîâàýòàïàðåøåíèÿóðàâ-
íåíèé äëÿãàçîâîéêîìïîíåíòû çàïèñûâàþòñÿââèäå:
U = u L ( − λt) + u R (λt)
2 + p L ( − λt) − p R (λt) 2
v u u t
( √ ρ L + √ ρ R ) 2
γ L + γ R
2 (ρ 3 L / 2 + ρ 3 R / 2 )(p L √ ρ L + p R √ ρ R )
P = p L ( − λt) + p R (λt)
2 + u L ( − λt) − u R (λt) 2
v u u t
γ L + γ R
2 (ρ 3 L / 2 + ρ 3 R / 2 )(p L √ ρ L + p R √ ρ R ) ( √ ρ L + √ ρ R ) 2 ,
äëÿ áåññòîëêíîâèòåëüíîéêîìïîíåíòû ñóòü:
V = v L ( − µt) + v R (µt)
2 + Π L ( − µt) − Π R (µt) 2
v u u t
( √ n L + √ n R ) 2
3(n 3 L / 2 + n 3 R / 2 )(Π L √ n L + Π R √ n R )
Π = Π L ( − µt) + Π R (µt)
2 + v L ( − µt) − v R (µt) 2
v u u
t 3(n 3 L / 2 + n 3 R / 2 )(Π L √ n L + Π R √ n R )
( √ n L + √ n R ) 2
ãäå
λ = v u u t
γ L + γ R
2 (p L √ ρ L + p R √ ρ R )
ρ 3 L / 2 + ρ 3 R / 2 µ = v u u t
3(Π L √ n L + Π R √ n R ) n 3 L / 2 + n 3 R / 2 q L ( − νt) = q R i − νt
2h
△ q i − q 6 i
1 − 2νt 3h
q R (νt) = q i L + νt 2h
△ q i + q i 6
1 − 2νt 3h
Ïðîöåäóðàïîñòðîåíèÿïàðàáîëûèïàðàìåòðîâ
q i R
,q i L
,△ q i
,q 6 i
ïîäðîáíîïðèâåäåíàâðàáîòå[13℄, à òàêæå â ðàáîòàõ[15, 16℄, ãäå êóñî÷íî-ïàðàáîëè÷åñêèé ìåòîä íà ëîêàëüíîì øàáëîíå
áûë èñïîëüçîâàí äëÿ ïîëíîé ñèñòåìû óðàâíåíèé ãàçîâîé äèíàìèêè è ìàãíèòíîé ãàçîâîé
äèíàìèêè.
3.2. Ëàãðàíæåâ ýòàï ðåøåíèÿ ãèäðîäèíàìè÷åñêèõ óðàâíåíèé
Íà ëàãðàíæåâîì ýòàïå ïðîèñõîäèò àäâåêòèâíûé ïåðåíîñãèäðîäèíàìè÷åñêèõ ïàðàìåò-
ðîâ èâñå óðàâíåíèÿíà ëàãðàíæåâîì ýòàïå èìåþò âèä:
∂f
∂t + ∇ · (f~v) = 0
ãäå
f
ìîæåò áûòü ïëîòíîñòüþρ, n
, èìïóëüñîìρ~u, n~v
, ïëîòíîñòüþ îáùåé ìåõàíè÷åñêîéρE, nW
èëè âíóòðåííåéρǫ, Π ξξ
ýíåðãèé ãàçà èëè áåññòîëêíîâèòåëüíîé êîìïîíåíòû. Äëÿ ðåøåíèÿóðàâíåíèé èñïîëüçóåòñÿàíàëîãè÷íûé, ÷òî èíà ýéëåðîâîìýòàïå ïîäõîä.Äëÿâû-÷èñëåíèÿïîòîêà
F = f~v
ïðèλ = | ~v |
èñïîëüçóåòñÿîðìóëà:F = v ×
f L ( − λt), v ≥ 0 f R (λt), v < 0
ãäå
f L ( − λt)
èf R (λt)
êóñî÷íî-ïàðàáîëè÷åñêèå óíêöèè äëÿ âåëè÷èíûf
è ñêîðîñòü íàèíòåðåéñå ìåæäóÿ÷åéêàìèâû÷èñëÿåòñÿïîóñðåäíåíèþîå:
v = v L √ ρ L + v R √ ρ R
√ ρ L + √ ρ R
Äëÿ ïîñòðîåíèÿ êóñî÷íî-ïàðàáîëè÷åñêîãî ðåøåíèÿ èñïîëüçóåòñÿ àíàëîãè÷íàÿ ýéëåðîâîìó
ýòàïó ïðîöåäóðà. Íà çàêëþ÷èòåëüíîì ýòàïå ïðîèñõîäèò ó÷åò ïîäñåòî÷íûõ ïðîöåññîâ ñ
ïîìîùüþìåòîäà Ýéëåðà.
3.3. Èñïîëüçîâàíèå ïåðåîïðåäåëííîé ñèñòåìû óðàâíåíèé
Íà çàâåðøàþùåì ýòàïå ðåøåíèÿãèäðîäèíàìè÷åñêèõ óðàâíåíèé ïðîèñõîäèò êîððåêòè-
ðîâêà ðåøåíèÿ.Âñëó÷àå ãðàíèöû ãàç-âàêóóì ñ èñïîëüçîâàíèåìîðìóëû[17℄:
| ~v | = q 2(E − ǫ), (E − ~v 2 /2)/E ≥ 10 − 3
âîñòàëüíîéîáëàñòèïðîèñõîäèòêîððåêòèðîâêà,êîòîðàÿãàðàíòèðóåòíåóáûâàíèåýíòðîïèè
[18℄:
| ρǫ | = ρE − ρ~v 2 2
!
, (E − ~v 2 /2)/E < 10 − 3
Òàêàÿ ìîäèèêàöèÿ îáåñïå÷èâàåò äåòàëüíûé áàëàíñ ýíåðãèé è ãàðàíòèðóåò íåóáûâàíèå
ýíòðîïèè.
3.4. åøåíèå óðàâíåíèÿ Ïóàññîíà
Ïîñëå ðåøåíèÿãèäðîäèíàìè÷åñêèõ óðàâíåíèé íåîáõîäèìî âîññòàíîâèòü ãðàâèòàöèîí-
íûé ïîòåíöèàë ïî ïëîòíîñòÿì ãàçîâîé è áåññòîëêíîâèòåëüíîé êîìïîíåíò. Äëÿ ýòîãî ìû
áóäåìèñïîëüçîâàòü27-òî÷å÷íûéøàáëîíäëÿàïïðîêñèìàöèèóðàâíåíèÿÏóàññîíà.Ýòîñâÿ-
çàíîñîáåñïå÷åíèåììàêñèìàëüíîéèíâàðèàíòíîñòèðåøåíèÿîòíîñèòåëüíî ïîâîðîòà.Àëãî-
ðèòì ðåøåíèÿóðàâíåíèÿÏóàññîíàáóäåò ñîñòîÿòü èç íåñêîëüêèõýòàïîâ.
Ýòàï 1. Ïîñòàíîâêà ãðàíè÷íûõ óñëîâèé äëÿ óðàâíåíèÿ Ïóàññîíà. Äëÿïîñòà-
íîâêè ãðàíè÷íûõ óñëîâèé äëÿ ãðàâèòàöèîííîãî ïîòåíöèàëà íà ãðàíèöå îáëàñòè
D
áóäåìèñïîëüçîâàòü ïåðâûå ÷ëåíû ìóëüòèïîëüíîãî ðàçëîæåíèÿ ñòàòè÷åñêèé, îñåâîé è öåíòðî-
áåæíûéìîìåíòûèíåðöèè.Êîãäàçíà÷åíèÿ ïîòåíöèàëàíàãðàíèöåîáëàñòèîïðåäåëåíû,òî
îíèïîäñòàâëÿþòñÿâ27-òî÷å÷íûéøàáëîíäëÿîïðåäåëåíèÿèòîãîâîéïëîòíîñòè
ρ i,k,l + n i,k,l
íà ãðàíèöå
D
.Ýòàï 2. Ïðåîáðàçîâàíèå ïëîòíîñòè â ïðîñòðàíñòâå ãàðìîíèê. Èòîãîâàÿ ïëîò-
íîñòüïðåäñòàâëÿåòñÿââèäåñóïåðïîçèöèèïîñîáñòâåííûìóíêöèÿìîïåðàòîðàËàïëàññà:
ρ i,k,l + n i,k,l = X
jmn
σ jmn exp ~iπij
I + ~iπkm
K + ~iπln L
!
ãäå
I, K, L
÷èñëîÿ÷ååêïîêàæäîéêîîðäèíàòå,~i
ìíèìàÿåäèíèöà.Äëÿýòîãîèñïîëüçóåòñÿ áûñòðîåïðåîáðàçîâàíèå Ôóðüå.Ýòàï 3. åøåíèå óðàâíåíèÿ Ïóàññîíà â ïðîñòðàíñòâå ãàðìîíèê. Ìû ïðåäïî-
ëàãàåì,÷òî ïîòåíöèàë òàêæåïðåäñòàâëåíââèäå ñóïåðïîçèöèèïî ñîáñòâåííûì óíêöèÿì
îïåðàòîðàËàïëàññàèìîæåòáûòüâû÷èñëåíïîñëåäóþùåéîðìóëåèçàìïëèòóäãàðìîíèê
ïëîòíîñòè:
φ jmn =
2
3 πh 2 σ jmn 1 −
1 − 2sin 2
πj I
3
1 − 2sin 2
πm K
3 1 − 2sin 2
πn L
3
Ïîñëå ÷åãî íåîáõîäèìî ïðîäåëàòü îáðàòíîå áûñòðîå ïðåîáðàçîâàíèå Ôóðüå ãàðìîíèê ïî-
òåíöèàëà âóíêöèîíàëüíîå ïðîñòðàíñòâî ãàðìîíèê.
4. Îïèñàíèå ïàðàëëåëüíîé ðåàëèçàöèè
Ñî-äèçàéí [5℄ èçèêî-ìàòåìàòè÷åñêîé ìîäåëè, ÷èñëåííîãî ìåòîäà è ñòðóêòóð äàííûõ
ïîçâîëÿåò èñïîëüçîâàòü ãåîìåòðè÷åñêóþ äåêîìïîçèöèþ ðàñ÷åòíîé îáëàñòè ñ îäíèì ñëîåì
ïåðåêðûòèÿ ïîäîáëàñòåé. Òàêóþ âîçìîæíîñòü ìû èìååì çà ñ÷¼ò ïîñòðîåíèÿ ïàðàáîë íà
ïðåäûäóùåì øàãå, ÷òî òðåáóåò òîëüêî ëîêàëüíîãî âçàèìîäåéñòâèÿ ìåæäó ÿ÷åéêàìè. Íà
ðèñóíêå 1 ïðèâåäåíû ïðîöåíòíûå ñîîòíîøåíèÿ ìåæäó ýòàïàìè Äëÿ ðåøåíèÿ óðàâíåíèÿ
Ïóàññîíà, â îñíîâå êîòîðîãî áûñòðîå ïðåîáðàçîâàíèå Ôóðüå äëÿ ñóïåðÝÂÌ ñ ðàñïðåäå-
ëåííîéïàìÿòüþ, áûëàèñïîëüçîâàíà áèáëèîòåêàFFTW [19℄. Âîñíîâåýòîéáèáëèîòåêè ëå-
æèòïðîöåäóðàALLTOALL,êîòîðàÿòðàíñïîíèðóåòòðåõìåðíûéìàññèâ,ïåðåðàñïðåäåëÿÿ
çíà÷èòåëüíûå îáúåìû ïàìÿòè ìåæäó âñåìè ïðîöåññàìè. Áåçóñëîâíî, ýòî äîðîãàÿ ñåòåâàÿ
îïåðàöèÿ, êîòîðàÿ òðåáóåò îòêàçà îò âñåãî àëãîðèòìà â ñëó÷àå èñïîëüçîâàíèè ñêîëü ëèáî
çíà÷èòåëüíîãî êîëè÷åñòâà âû÷èñëèòåëåé. Îäíàêî, ýòà ïðîöåäóðà â ñëó÷àå èñïîëüçîâàíèÿ
ñåòåâîé èíðàñòðóêòóðûInniBand íå çàíèìàåò êðèòè÷åñêîå âðåìÿ è,ïî âñåéâèäèìîñòè,
îïòèìèçèðîâàíà íàíèçêîì ñåòåâîìóðîâíå [20℄.
Îñíîâíûìèýòàïàìèâû÷èñëèòåëüíîéñõåìûÿâëÿþòñÿýéëåðîâèëàãðàíæåâýòàïû. Ìû
ñîñðåäîòî÷èìñÿèìåííî íàýòèõýòàïàõ, êàêíàíàèáîëåå çàòðàòíûõ.Îòäåëüíîîñòàíîâèìñÿ
íà ïðîöåäóðå âû÷èñëåíèÿ øàãà ïî âðåìåíè, èñõîäÿ èç óñëîâèÿ Êóðàíòà. Âñëó÷àå èñïîëü-
çîâàíèÿ ãðàè÷åñêèõ óñêîðèòåëåéäàííàÿ ïðîöåäóðà áûëà ðåàëèçîâàíà òîëüêî íà CPU[7℄
(òàêæåáûëîñäåëàíîèâêîäåGAMER[21℄).Ïðè÷èíàýòîãîîòñóòñòâèåýåêòèâíîéíèç-
êîóðîâíåâîéðåàëèçàöèèðåäóöèðóþùåéîïåðàöèåé
min
âòåõíîëîãèèCUDA.Âòîâðåìÿêàê(4%)
(6%)
(10%) (80%)
(a)
èñ. 1.Ïðîöåíòíîåñîîòíîøåíèåìåæäóýòàïàìè ÷èñëåííîãîìåòîäà
âOpenMPòàêàÿîïåðàöèÿýåêòèâíîðåàëèçîâàíà.Ñòîèìîñòüýòîéïðîöåäóðûñîñòàâëÿåò
ïîðÿäêà îäíîãî ïðîöåíòà îò îáùåãî âðåìåíè âû÷èñëåíèé è ïðàêòè÷åñêè íå âëèÿåò íà ý-
åêòèâíîñòü ïàðàëëåëüíîé ðåàëèçàöèè.Îäíàêî,ïðè óâåëè÷åíèè êîëè÷åñòâà ãðàè÷åñêèõ
ÿäåð äî íåñêîëüêèõ òûñÿ÷ è ñòîêðàòíîãî óñêîðåíèÿ â ðàìêàõ îäíîãî ãðàè÷åñêîãî ïðî-
öåññîðà ñóììàðíîâñåõ îñòàëüíûõïðîöåäóð, ìîæåò âîçíèêíóòüêóðü¼çíàÿ ñèòóàöèÿ,êîãäà
ïðîöåäóðà âû÷èñëåíèÿ øàãà ïî âðåìåíè áóäåò âûïîëíÿòüñÿ äîëüøå âñåõ îñòàëüíûõ. Ïðè
òîì, ÷òî àâòîðàìè óæå áûëî äîñòèãíóòî 55-êðàòíîåóñêîðåíèå â ðàìêàõ îäíîãîGPU [7℄ è
êîëè÷åñòâîãðàè÷åñêèõÿäåðâîäíîìóñêîðèòåëåóâåëè÷èâàåòñÿ,òîòàêàÿñèòóàöèÿìîæåò
áûòü äîñòèãíóòà â áëèæàéøèå ïàðó ëåò. Ñòîèò îòìåòèòü, ÷òî òàêàÿ ïðîáëåìà â ïðèíöèïå
íåâîçìîæíàíàóñêîðèòåëÿõ Intel XeonPhi.
Èñïîëüçîâàíèå ðàâíîìåðíîé ñåòêèâ äåêàðòîâûõ êîîðäèíàòàõäëÿ ðåøåíèÿóðàâíåíèé
ãèäðîäèíàìèêè ïîçâîëÿåò èñïîëüçîâàòüïðîèçâîëüíóþäåêàðòîâó òîïîëîãèþ äëÿäåêîìïî-
çèöèè ðàñ÷åòíîé îáëàñòè. Òàêàÿ îðãàíèçàöèÿ âû÷èñëåíèé èìååò ïîòåíöèàëüíî áåñêîíå÷-
íóþ ìàñøòàáèðóåìîñòü. Â íàøåé ïðîãðàììíîé ðåàëèçàöèè èñïîëüçóåòñÿ ìíîãîóðîâíåâàÿ
îäíîìåðíàÿ äåêîìïîçèöèÿ ðàñ÷åòíîé îáëàñòè. Ïî îäíîé êîîðäèíàòå âíåøíåå îäíîìåðíîå
ðàçðåçàíèå ïðîèñõîäèò ñðåäñòâàìè òåõíîëîãèè MPI, âíóòðè êàæäîé ïîäîáëàñòè ðàçðåçà-
íèåïðîèñõîäèòñðåäñòâàìèOpenMP,àäàïòèðîâàííîãîäëÿMIC-àðõèòåêòóð(ðèñ.2).Òàêîé
ïîäõîä èñïîëüçîâàëñÿ òàêæå è â ïåðâîéâåðñèè ïðîãðàììíîãî êîäà AstroPhi [22℄ ó÷åòîì
èñïîëüçîâàíèÿ ooad ðåæèìà. Òàêàÿ äåêîìïîçèöèÿ ñâÿçàíà ñ òîïîëîãèåé è àðõèòåêòóðîé
#pragma omp parallel for ...
{ ...
}
MPI
èñ.2.Äåêîìïîçèöèÿðàñ÷åòíîéîáëàñòè.Ñåðûìöâåòîìîáîçíà÷åíûñëîèïåðåêðûòèÿïîäîáëàñòåé
ãèáðèäíîãî ñóïåðÝÂÌ RSC PetaStream, êîòîðûé áûë èñïîëüçîâàí äëÿ âû÷èñëèòåëüíûõ
ýêñïåðèìåíòîâ. Äàëåå ïðèâåäåì èññëåäîâàíèÿ ïðîèçâîäèòåëüíîñòè ïðîãðàììíîé ðåàëèçà-
öèè.
4.1. Èññëåäîâàíèå óñêîðåíèÿ êîäà â ðàìêàõ îäíîãî MIC
Ïðîâîäèëîñüèññëåäîâàíèåóñêîðåíèÿïðîãðàììíîéðåàëèçàöèèíàñåòêå
512 3
.Äëÿýòîãî çàìåðÿëîñü âðåìÿ êàæäîãî ýòàïà (Euler, Lagrange)÷èñëåííîãî ìåòîäà, âñåêóíäàõ, à çàòåìâû÷èñëÿëàñü èõñóììàïðè ðàçëè÷íîì ÷èñëåèñïîëüçóåìûõ ëîãè÷åñêèõÿäåð. Óñêîðåíèå
P
âû÷èñëÿëîñüïî îðìóëå 1:
P = T otal 1
T otal K
(1)ãäå
T otal 1
âðåìÿâû÷èñëåíèéíàîäíîìëîãè÷åñêîìÿäðå,T otal K
âðåìÿâû÷èñëåíèéïðèèñïîëüçîâàíèè
K
ëîãè÷åñêèõÿäåð.Òàêæåáûëàñäåëàíàîöåíêàðåàëüíîéïðîèçâîäèòåëüíîñòè(
S real
).Äëÿîöåíêèïèêîâîéïðîèçâîäèòåëüíîñòè
S peak
óñêîðèòåëÿ IntelXeonPhiáûëà èñïîëüçîâàíà îðìóëà 2:S peak = U nits × F requency × Cores = 74 , 28 GF LOP S
(2)ãäå
U nits = 1 F LOP
÷èñëî ñêàëÿðíûõ óñòðîéñòâ,F requency = 1 , 238 GHz
òàêòîâàÿ÷àñòîòà,
Cores = 60
÷èñëîèçè÷åñêèõ ÿäåð.Äëÿíàõîæäåíèÿ ýåêòèâíîñòèèñïîëüçî- âàíèÿ óñêîðèòåëÿS ratio
áóäåìèñïîëüçîâàòüîðìóëó 3S ratio = S real
S peak
(3)Ýòàõàðàêòåðèñòèêà ïðîèçâîäèòåëüíîñòèñ îäíîéñòîðîíû íåñêîëüêîñòðàííàÿ, òàêêàêïè-
êîâûéóðîâåíüïðîèçâîäèòåëüíîñòèIntelXeonPhiýòîâåëè÷èíàïîðÿäêàîäíîãîòåðàëîïñà.
Îäíàêî,òàêàÿïðîèçâîäèòåëüíîñòüâîñíîâíîìäîñòèãàåòñÿâåêòîðíûìðàñøèðåíèåì,êîòî-
ðûå ìûïîêà íå èñïîëüçóåì. åçóëüòàòû èññëåäîâàíèÿóñêîðåíèÿ íà ñåòêå
512 3
ïðèâåäåíû íà ðèñóíêå 3. Òàêèì îáðàçîì, áûëî ïîëó÷åíî 134-êðàòíîå óñêîðåíèå (ìàñøòàáèðóåìîñòüâñèëüíîì ñìûñëå)â ðàìêàõ îäíîãî óñêîðèòåëÿ Intel XeonPhi. Êîëè÷åñòâîãèãàëîïñîâ ñî-
îòíîñÿòñÿ ñ ðåçóëüòàòàìè, ïîëó÷åííûìè â êíèãå [23℄. À ðåçóëüòàò ïîðÿäêà ÷óòü ìåíåå 29
GFLOPS ýòîïðèìåðíî 40%îò ïèêîâîé,÷òî äîñòàòî÷íîâûñîêèé ðåçóëüòàò.
4.2. Èññëåäîâàíèå ìàñøòàáèðóåìîñòè êîäà
ÏðîâîäèëîñüèññëåäîâàíèåìàñøòàáèðóåìîñòèêîäàAstroPhiíàðàñ÷åòíîéñåòêå
512 p × 512 × 512
ïðè èñïîëüçîâàíèè 4 ëîãè÷åñêèõÿäåð íà êàæäûé óñêîðèòåëü, ãäåp
÷èñëî èñ-ïîëüçóåìûõóñêîðèòåëåé.Òàêèìîáðàçîì,íàêàæäûéóñêîðèòåëüïðèõîäèòñÿðàçìåðïîäîá-
ëàñòè
512 3
. Äëÿèññëåäîâàíèÿ ìàñøòàáèðóåìîñòè çàìåðÿëîñü âðåìÿ êàæäîãî ýòàïà (Euler, Lagrange) ÷èñëåííîãî ìåòîäà, â ñåêóíäàõ, à çàòåì âû÷èñëÿëàñü èõ ñóììà ïðè ðàçëè÷íîì1 2 4 8 16 32 64 128 256 0
20 40 60 80 100 120 140
Speed-Up
The logical cores
(a)
1 2 4 8 16 32 64 0,0
0,2 0,4 0,6 0,8 1,0 1,2
Scalability
MICs
èñ. 3.Óñêîðåíèå(a)è ìàñøòàáèðóåìîñòü(b)ïðîãðàììíîé ðåàëèçàöèè
-2 -1 0 1 2 -2
-1 0 1 2
x 10 kpc
x10kpc
0
200
400
600
800
1000
(a)
-2 -1 0 1 2
-2 -1 0 1 2
x 10 kpc
x10kpc
0
100
200
300
400
500
(b)
-2 -1 0 1 2
-2 -1 0 1 2
x 10 kpc
x10kpc
0
2
4
6
8
10
()
-2 -1 0 1 2
-2 -1 0 1 2
x 10 kpc
x10kpc
0
2
4
6
8
10
(d)
èñ. 4. åçóëüòàòû ìîäåëèðîâàíèÿíà ìîìåíò âðåìåíè
t = 120
ìèëëèîíîâ ëåò. Ñòîëáöåâàÿ ïëîò- íîñòü âM ⊙ pc − 2
ãàçîâîé (a) è áåññòîëêíîâèòåëüíîé (b) êîìïîíåíò, ìîëåêóëÿðíîãî âîäîðîäà (). ÑðåäíÿÿñêîðîñòüïðîöåññàçâåçäîîáðàçîâàíèÿâM ⊙ pc − 2 myr − 1
(d)÷èñëåèñïîëüçóåìûõóñêîðèòåëåéIntelXeonPhi.Ìàñøòàáèðóåìîñòü
T
âû÷èñëÿëîñüïîîð- ìóëåT = T otal 1
T otal p
(4)ãäå
T otal 1
âðåìÿ âû÷èñëåíèé íà îäíîì óñêîðèòåëå ïðè èñïîëüçîâàíèè îäíîãî óñêîðèòå- ëÿ,T otal p
âðåìÿ âû÷èñëåíèé íà îäíîì óñêîðèòåëåé ïðè èñïîëüçîâàíèèp
óñêîðèòåëåé.åçóëüòàòûèññëåäîâàíèéìàñøòàáèðóåìîñòèïðèâåäåíûíàðèñóíêå3.Òàêèìîáðàçîì,ïðè
èñïîëüçîâàíèè âñåõ 64 óñêîðèòåëåéáûëà ïîëó÷åíà 92 %ýåêòèâíîñòü, ÷òî ÿâëÿåòñÿ äî-
ñòàòî÷íîâûñîêèì ðåçóëüòàòîì.
5. åçóëüòàòû âû÷èñëèòåëüíûõ ýêñïåðèìåíòîâ
Áóäåì ìîäåëèðîâàòü ñòîëêíîâåíèå äâóõãàëàêòèêñ ìàññîé
M = 10 13 M ⊙
,äâèãàþùèõñÿ íàâñòðå÷ó äðóã äðóãó ñî ñêîðîñòüþv cr = 800
êì/ñåê, êàæäàÿ èç êîòîðûõ çàäàíà äâóìÿ ñàìîãðàâèòèðóþùèìè ñåðè÷åñêèìè îáëàêàìè äëÿ îïèñàíèÿãàçîâîé è áåññòîëêíîâèòåëü-íîé êîìïîíåíòû ñ ðàâíîâåñíûì íà÷àëüíûì ðàñïðåäåëåíèåì ïëîòíîñòè, äàâëåíèÿ/òåíçîðà
äèñïåðñèèñêîðîñòåé.àëàêòèêèâðàùàþòñÿâïðîòèâîïîëîæåííûåñòîðîíûñäèåðåíöè-
àëüíûì âðàùåíèåì:
v φ = s
r ∂ Φ
∂r
Íà÷àëüíîåðàññòîÿíèåìåæäó öåíòðàìè
30 kpc
.Íàðèñóíêå4ïðåäñòàâëåíûðåçóëüòàòûðàñ- ïðåäåëåíèÿñòîëáöåâîéïëîòíîñòèãàçîâîéèçâåçäíîéêîìïîíåíòû.Ïîñëåïðîöåññàñòîëêíî-âåíèÿ çà ðîíòîìóäàðíîé âîëíû ïðîèñõîäèò àêòèâíûé ðîñò ñêîðîñòèçâåçäîîáðàçîâàíèÿ,
à òàêæåíà ìåñòåáóäóùåéãàëàêòèêèîáðàçóåòñÿìîëåêóëÿðíûé âîäîðîä.
6. Çàêëþ÷åíèå
Ïðîâåäåíû âû÷èñëèòåëüíûå ýêñïåðèìåíòû ïî èññëåäîâàíèþïðîèçâîäèòåëüíîñòè ïðî-
ãðàììíîé ðåàëèçàöèè íà êëàñòåðå RSC PetaStream. Áûëî ïîëó÷åíî 134-êðàòíîå óñêîðå-
íèå (ìàñøòàáèðóåìîñòü â ñèëüíîì ñìûñëå) â ðàìêàõ îäíîãî óñêîðèòåëÿ Intel Xeon Phi,
92-ïðîöåíòíàÿ ýåêòèâíîñòü (ìàñøòàáèðóåìîñòü â ñëàáîì ñìûñëå) ïðè èñïîëüçîâàíèè
64 óñêîðèòåëåé Intel Xeon Phi, è 40-ïðîöåíòíàÿ ïðîèçâîäèòåëüíîñòü îò ïèêîâîé ïðè èñ-
ïîëüçîâàíèè ñêàëÿðíûõ âû÷èñëåíèéíà óñêîðèòåëåIntel Xeon Phi. Ïðèâåäåíû ðåçóëüòàòû
ñóïåðêîìïüþòåðíûõâû÷èñëèòåëüíûõýêñïåðèìåíòîâïî ñòîëêíîâåíèþäèñêîâûõãàëàêòèê.
Ñìîäåëèðîâàíû îáëàñòè ñ àêòèâíûì çâåçäîîáðàçîâàíèåì è îáðàçîâàíèåì ìîëåêóëÿðíîãî
âîäîðîäà,êîòîðûå ïîëó÷àþòñÿâðåçóëüòàòåñòîëêíîâåíèÿãàëàêòèê.
Àâòîðû âûðàæàþò áëàãîäàðíîñòü êîëëåãàìèç îðãàíèçàöèé Intel (ÍèêîëàþÌåñòåðó è
ÄìèòðèþÏåòóíèíó)èRSCGroup(ÀëåêñàíäðóÌîñêîâñêîìó,ÀëåêñåþØìåëåâó,Âëàäèìè-
ðóÌèðîíîâó,ÏàâëóËàâðåíêîèÁîðèñóàãàðèíîâó),çàïðåäîñòàâëåíèåäîñòóïàêêëàñòåðó
RSCPetaStream èïîäðîáíûå êîíñóëüòàöèè ïîåãîèñïîëüçîâàíèþ.
Ëèòåðàòóðà
1. TutukovA., Lazareva G.,KulikovI. GasDynamis ofa Central Collisionof Two Galaxies:
Merger,Disruption, Passage, andtheFormation ofaNew Galaxy//AstronomyReports.
2011. V.55, I.9. P.770-783.
2. CombesF.,Melhior A.Chemodynamialevolutionofinterating galaxies// Astrophysis
and SpaeSiene. 2002. V.281, I.1-2. P.383-387.
3. ShweizerF. Merger-InduedStarbursts // Astrophysisand SpaeSieneLibrary.
2005. V.329. P.143-152.
4. SolAlonso M., LambasD., TisseraP.,Coldwell G. Ative galati nuleiand galaxy
interations// MonthlyNoties oftheRoyalAstronomialSoiety. 2007. V.375,I. 3.
P.1017-1024.
5. Glinskiy B.,KulikovI., SnytnikovA., Romanenko A., ChernykhI., VshivkovV.Co-design
of Parallel NumerialMethodsfor Plasma Physis andAstrophysis // Superomputing
frontiers andinnovations. 2014. V.1, I.3. P.88-98.
6. Mithell N.,VorobyovE., HenslerG. CollisionlessStellar HydrodynamisasanEient
Alternative to N-bodyMethods // Monthly Notiesof theRoyalAstronomialSoiety.
2013. V.428. P.2674-2687.
7. KulikovI. GPUPEGAS:ANew GPU-aelerated Hydrodynami Code for Numerial
Simulations ofInteratingGalaxies // TheAstrophysial JournalSupplementsSeries.
2014. V.214, I. 12.P.1-12.
8. BerginE., HartmannL., RaymondJ., Ballesteros-Paredes J.Moleular CloudFormation
behindShokWaves//The Astrophysial Journal. 2004. V.612. P.921-939.
9. Katz N., WeinbergD., HernquistL.Cosmologial simulationswithTreeSPH// The
Astrophysial JournalSupplement Series. 1996. V.105. P.19-35.
10. Springel V.,HernquistL.Cosmologial smoothedpartile hydrodynamissimulations:a
hybridmultiphase modelfor starformation // Monthly Noties oftheRoyalAstronomial
Soiety. 2003. V.339,I. 2. P.289-311.
11. SutherlandR., DopitaM. Cooling funtionsfor low-densityastrophysial plasmas // The
Astrophysial JournalSupplement Series. 1993. V.88.P.253-327.
12. RoeP.Approximate Riemannsolvers,parameter vetors, anddierene solvers // Journal
of ComputationalPhysis.1997. V.135. P.250-258.
13. Collela P., WoodwardP.The PieewiseParaboli Method(PPM)Gas-Dynamial
simulations // JournalofComputationalPhysis. 1984. V.54. P.174-201.
14. KulikovI.,VorobyovE.Using thePPMLapproah foronstruting alow-dissipation,
operator-splittingsheme for numerialsimulations ofhydrodynamiows// New
Astronomy. 2015.(in progress)
15. PopovM., UstyugovS.Pieewiseparaboli methodon loalstenil for gasdynami
simulations // ComputationalMathematisand MathematialPhysis.2007. V. 47,I.
12. P.1970-1989.
16. PopovM., UstyugovS.Pieewiseparaboli methodon aloalstenil for ideal
magnetohydrodynamis//ComputationalMathematisandMathematial Physis. 2008.
V.48,I. 3. P.477-499.
17. VshivkovV.,Lazareva G.,SnytnikovA.,KulikovI.,TutukovA.Computationalmethods
for ill-posedproblems ofgravitational gasodynamis// JournalofInverse andIll-posed
Problems. 2011. V.19,I. 1. P.151-166
18. GodunovS., Kulikov I.Computation ofDisontinuousSolutions ofFluidDynamis
Equations withEntropyNonderease Guarantee// ComputationalMathematisand
Mathematial Physis. 2014. V.54, I.6. P.1012-1024.
19. FrigoM.,Johnson S.TheDesign andImplementation ofFFTW3 //Proeedings ofthe
IEEE. 2005. V.93,I. 2. P.216-231.
20. Kalinkin A.,Laevsky Y.,GololobovS.2DFastPoissonSolverfor High-Performane
Computing// LetureNotes inComputerSiene. 2009. V.5698. P.112-120.
21. ShiveH., TsaiY., ChiuehT.GAMER: a GPU-aeleratedAdaptive-Mesh-Renement
Codefor Astrophysis // TheAstrophysial Journal. 2010. V.186. P.457-484.
22. KulikovI.M., ChernykhI.G., SnytnikovA.V.,Glinskiy B.M.,TutukovA.V.AstroPhi:A
ode for omplexsimulation ofdynamisof astrophysialobjetsusing hybrid
superomputers //Computer PhysisCommuniations. 2015. V.186.P.71-80.
23. Jeers J., ReindersJ.IntelXeonPhiCoproessorHigh Performane Programming.
Elsevier. 2013.409 p.