• Aucun résultat trouvé

Стреловидная декомпозиция для блочно-трехдиагональной СЛАУ (Arrowhead decomposition for a block-tridiagonal system of linear equations)

N/A
N/A
Protected

Academic year: 2022

Partager "Стреловидная декомпозиция для блочно-трехдиагональной СЛАУ (Arrowhead decomposition for a block-tridiagonal system of linear equations)"

Copied!
7
0
0

Texte intégral

(1)

Ñòðåëîâèäíàÿ äåêîìïîçèöèÿ äëÿ áëî÷íî-òðåõäèàãîíàëüíîé ÑËÀÓ

Ï.À. Áåëîâ, Å.Ð. Íóãóìàíîâ, Ñ.Ë. ßêîâëåâ Ñàíêò-Ïåòåðáóðãñêèé Ãîñóäàðñòâåííûé Óíèâåðñèòåò

Ïðåäñòàâëåí ìåòîä ñòðåëîâèäíîé äåêîìïîçèöèè îáåñïå÷èâàþùèé ýôôåêòèâíîå ïàðàëëåëüíîå ðåøåíèå áëî÷íî-òðåõäèàãîíàëüíîé ñèñòåìû ëèíåéíûõ àëãåáðàè-

÷åñêèõ óðàâíåíèé (ÑËÀÓ). Âû÷èñëèòåëüíîå óñêîðåíèå ïî îòíîøåíèþ ê ìåòîäó ìàòðè÷íîé ïðîãîíêè îöåíåíî àíàëèòè÷åñêè ïóòåì ó÷åòà ÷èñëà ýëåìåíòàðíûõ îïå- ðàöèé óìíîæåíèÿ äëÿ ïàðàëëåëüíûõ è ïîñëåäîâàòåëüíûõ ÷àñòåé ìåòîäà äåêîì- ïîçèöèè. Ïîêàçàíî, ÷òî ìàêñèìàëüíîå óñêîðåíèå äîñòèãàåòñÿ ïðè êîíå÷íîì ÷èñ- ëå ïàðàëëåëüíûõ ïðîöåññîðîâ. Äëÿ çàäàííîãî ðàçìåðà èñõîäíîé ÑËÀÓ ïîëó÷å- íû ïàðàìåòðû âû÷èñëèòåëüíîé ñèñòåìû, ïðè êîòîðûõ äîñòèãàåòñÿ ìàêñèìàëüíîå óñêîðåíèå. Âû÷èñëèòåëüíûå ýêñïåðèìåíòû ïîäòâåðæäàþò àíàëèòè÷åñêèå îöåíêè âû÷èñëèòåëüíîãî óñêîðåíèÿ.

1. Ââåäåíèå

Ìíîãèå çàäà÷è ïðèêëàäíîé ìàòåìàòèêè è âû÷èñëèòåëüíîé ôèçèêè ïîñëå äèñêðåòèçà- öèè ñâîäÿòñÿ ê ñèñòåìå ëèíåéíûõ àëãåáðàè÷åñêèõ óðàâíåíèé (ÑËÀÓ) ñ áëî÷íî-òðåõäèàãî- íàëüíîé ìàòðèöåé.  êà÷åñòâå ïðîñòåéøåãî ïðèìåðà ìîæíî ïðèâåñòè êîíå÷íî-ðàçíîñòíóþ àïïðîêñèìàöèþ ãðàíè÷íîé çàäà÷è äëÿ äâóìåðíîãî óðàâíåíèÿ Ïóàññîíà íà ðàâíîìåðíûõ ñåòêàõ ïî êàæäîé êîîðäèíàòå.  ýòîì ñëó÷àå áëîêè ìàòðèöû áóäóò ðàçðåæåííûìè.  äåé- ñòâèòåëüíîñòè, ìàòðèöà ÑËÀÓ áóäåò ïÿòèäèàãîíàëüíîé, íî ñ áëî÷íîé ñòðóêòóðîé. Ëåí- òî÷íûå èëè çàïîëíåííûå áëîêè â òàêîé çàäà÷å ìîæíî ïîëó÷èòü, åñëè ñîâìåñòèòü êîíå÷íî- ðàçíîñòíóþ äèñêðåòèçàöèþ ïî îäíîé êîîðäèíàòå ñ ðàçëîæåíèåì ïî ñïëàéíàì èëè èíûì ôóíêöèÿì ïî äðóãîé êîîðäèíàòå.

Ñðàâíèòåëüíî áûñòðûì àëãîðèòìîì äëÿ ðåøåíèÿ áëî÷íî-òðåõäèàãîíàëüíûõ ñèñòåì ÿâ- ëÿåòñÿ ìåòîä ìàòðè÷íîé ïðîãîíêè [1]. Äàííûé ìåòîä îáëàäàåò ñâîéñòâîì óñòîé÷èâîñòè äëÿ áëîêîâ ñ äèàãîíàëüíûì ïðåîáëàäàíèåì. Íåäîñòàòêîì ìåòîäà ÿâëÿåòñÿ åãî ïîñëåäîâàòåëü- íîñòü è òðóäíîñòü ïàðàëëåëèçàöèè íà ñîâðåìåííûõ ñóïåðêîìïüþòåðíûõ è êëàñòåðíûõ ñè- ñòåìàõ. Åäèíñòâåííîé âîçìîæíîñòüþ ïàðàëëåëèçàöèè ÿâëÿåòñÿ ïàðàëëåëèçàöèÿ íà óðîâíå ìàòðè÷íûõ îïåðàöèé óìíîæåíèÿ è îáðàùåíèÿ áëîêîâ.

Ïàðàëëåëüíûå ìåòîäû ðåøåíèÿ òðåõäèàãîíàëüíûõ ÑËÀÓ ðàçðàáàòûâàëèñü ñ êîíöà øå- ñòèäåñÿòûõ ãîäîâ äâàäöàòîãî âåêà. Ê äàííûì ìåòîäàì ìîæíî îòíåñòè øèðîêî èçâåñòíûé ìåòîä öèêëè÷åñêîé ðåäóêöèè [2], à òàêæå ìåòîä Âîíãà [3], êîòîðûé óñïåøíî ïðèìåíÿåòñÿ â ñîâðåìåííûõ ðàñ÷åòàõ [4]. Äàííûå ìåòîäû áûëè îáîáùåíû äëÿ èñïîëüçîâàíèÿ ñ ëåíòî÷- íûìè è áëî÷íî-òðåõäèàãîíàëüíûìè ìàòðèöàìè [5]. Ïîìèìî ïåðå÷èñëåííûõ ìåòîäîâ, äëÿ ðåøåíèÿ òðåõäèàãîíàëüíûõ ÑËÀÓ áûë ðàçðàáîòàí áîëåå ýôôåêòèâíûé àëãîðèòì, îñíîâàí- íûé íà èíæåíåðíîì ìåòîäå äåêîìïîçèöèè îáëàñòè. Ïðèìåíèòåëüíî ê ÑËÀÓ, îí çàêëþ÷àåòñÿ â ðàçäåëåíèè ïîëíîé ìàòðèöû íà íåçàâèñèìûå áëîêè è îäèí ñâÿçóþùèé áëîê [6]. Ñõîäíûå èäåè èñïîëüçóþòñÿ â ñîâðåìåííûõ ðàáîòàõ, íàïðèìåð [7], à äëÿ ëåíòî÷íûõ ìàòðèö ïðè- âîäÿòñÿ â òåõíè÷åñêîì îò÷åòå ScaLAPACK [8].

 äàííîé ðàáîòå ìû îáîáùèëè ìåòîä äåêîìïîçèöèè íà ñëó÷àé áëî÷íî-òðåõäèàãîíàëü- íîé ÑËÀÓ. Ìåòîä ñâîäèòñÿ ê ïðåîáðàçîâàíèþ èñõîäíîé áëî÷íî-òðåõäèàãîíàëüíîé ÑËÀÓ ê ýêâèâàëåíòíîìó âèäó ñî ñòðåëîâèäíîé ôîðìîé íîâîé ìàòðèöû ïóòåì ïåðåñòàíîâêè áëî÷íûõ ñòðîê è ñòîëáöîâ èñõîäíîé ñèñòåìû.  ðåçóëüòàòå, ìàòðèöà ñèñòåìû ëîãè÷åñêè ïðèâîäèòñÿ ê ñòðåëîâèäíîé ôîðìå, âêëþ÷àþùåé êðóïíûå ñóïåð-áëîêè íà äèàãîíàëè, äîïóñêàþùèå ïà-

Ðàáîòà ïîääåðæàíà ãðàíòîì ÐÔÔÈ  14-02-00326 è ãðàíòîì ÑÏáÃÓ â ðàìêàõ ïðîåòà  11.38.241.2015.

Âñå âû÷èñëåíèÿ áûëè ïðîâåäåíû íà îáîðóäîâàíèè ðåñóðñíîãî öåíòðà Âû÷èñëèòåëüíûé öåíòð ÑÏáÃÓ.

447

(2)

ðàëëåëüíîå îáðàùåíèå, ðàçðåæåííûå ñóïåð-áëîêè âäîëü ïðàâîé è íèæíåé ñòîðîí ìàòðèöû è îäíîãî ñâÿçóþùåãî ñóïåð-áëîêà â ïðàâîì íèæíåì óãëó ìàòðèöû. Ìû ïîêàçûâàåì, ÷òî äàííàÿ ñòðåëîâèäíàÿ ñèñòåìà äîïóñêàåò ýôôåêòèâíîå ïàðàëëåëüíîå ðåøåíèå [9]. Äëÿ îáðà- ùåíèÿ ñóïåð-áëîêîâ èñïîëüçóåòñÿ ìåòîä ìàòðè÷íîé ïðîãîíêè. Ìû àíàëèòè÷åñêè ïîëó÷àåì âû÷èñëèòåëüíîå óñêîðåíèå ìåòîäà îòíîñèòåëüíî ìåòîäà ìàòðè÷íîé ïðîãîíêè. Îíî âû÷èñ- ëÿåòñÿ èç îöåíêè ÷èñëà ìóëüòèïëèêàòèâíûõ îïåðàöèé íà êàæäîì øàãå ìåòîäà è ñðàâíåíèè åãî ñ àíàëîãè÷íûìè õàðàêòåðèñòèêàìè ìåòîäà ìàòðè÷íîé ïðîãîíêè. Ðåàëèçîâàííûé ìåòîä ñòðåëîâèäíîé äåêîìïîçèöèè ïîçâîëèë íàì ïðîâåñòè âû÷èñëèòåëüíûå ýêñïåðèìåíòû è ïîä- òâåðäèòü àíàëèòè÷åñêèå îöåíêè íà ïðàêòèêå.  êà÷åñòâå òåñòîâîé ÑËÀÓ èñïîëüçîâàëàñü äèñêðåòèçàöèÿ ãðàíè÷íîé çàäà÷è äëÿ äâóìåðíîãî èíòåãðî-äèôôåðåíöèàëüíîãî óðàâíåíèÿ Ôàääååâà [10].

2. Ìåòîä ñòðåëîâèäíîé äåêîìïîçèöèè

 ðàáîòå èçó÷àåòñÿ áëî÷íî-òðåõäèàãîíàëüíàÿ ÑËÀÓ, îïèñûâàåìàÿ óðàâíåíèåì

AiXi1+CiXi+BiXi+1=Fi, A1 =BN = 0, (1) ãäå Ai,Ci, Bi, i= 1, . . . , N áëîêè ðàçìåðà n×n ìàòðèöû ñèñòåìû. Ýëåìåíòû Fi, ðàçìåðà n×l, ãäål≥1, â ïðàâîé ÷àñòè ýòî áëîêè ñóïåðâåêòîðàF~. Èñêîìûé ñóïåðâåêòîðX~ ñîñòîèò èç áëîêîâ Xi. Ïîëíûé ðàçìåð ìàòðèöû (nN)×(nN).

Íà ðèñóíêå 1 ïîêàçàí ïðèìåð ïðåîáðàçîâàíèÿ èñõîäíîé ìàòðèöû ÑËÀÓ äëÿ N = 15 ê ýêâèâàëåíòíîìó âèäó ñî ñòðåëîâèäíîé ôîðìîé íîâîé ìàòðèöû. Ïðåîáðàçîâàíèå ÑËÀÓ îñóùåñòâëÿåòñÿ ïåðåñòàíîâêîé áëî÷íûõ-ñòðîê è ñòîëáöîâ. Ïåðåñòàíîâêà ñòðîê ïðèâîäèò ê ïåðåñòàíîâêå áëîêîâ â ïðàâîé ÷àñòè ñèñòåìû, à ïåðåñòàíîâêà ñòîëáöîâ ê ïåðåñòàíîâêå áëîêîâ èñêîìîãî ðåøåíèÿ. Ïîëó÷åííàÿ ñòðåëîâèäíàÿ ÑËÀÓ ìîæåò áûòü êðàòêî çàïèñàíà â âèäå áëî÷íîé ìàòðèöû 2×2

 S WR WL H



s h

=

Fs Fh

. (2)

Çäåñü S ñîñòîèò èç íåçàâèñèìûõ ñóïåð-áëîêîâ íà äèàãîíàëè Sk,k = 1, . . . , M. Îñòàëüíûå ýëåìåíòû îáîçíà÷åíû íà ðèñóíêå 1. Ðåøåíèå ñèñòåìû (2) äàåòñÿ ôîðìóëàìè





s = S−1FsS−1WRh h = HWLS1WR1

FhWLS1Fs

. (3)

Óðàâíåíèå (3) â ïðàâîé ÷àñòè ñîäåðæèò ìàòðè÷íûå óìíîæåíèÿ è îáðàùåíèÿ, êîòîðûå, ïðè äåòàëüíîì ðàññìîòðåíèè, îáëàäàþò áîëüøîé ñòåïåíüþ ïàðàëëåëèçìà. Íåçàâèñèìîñòü ñóïåð- áëîêîâ Sk ïîçâîëÿåò âû÷èñëÿòü ïðîèçâåäåíèÿ S1Fs, S1WR ïàðàëëåëüíî. Áîëåå òîãî, âìåñòî âû÷èñëåíèé îáðàòíûõ ìàòðèö, äîñòàòî÷íî ïðîñòî ðåøèòü íàáîð ÑËÀÓ çíà÷èòåëü- íî ìåíüøåãî ðàçìåðà. Ââèäó ðàçðåæåííîé ñòðóêòóðû ñóïåð-áëîêîâ WL, WR êîëè÷åñòâî óìíîæåíèé òàêæå ìíîãîêðàòíî ñîêðàùàåòñÿ.  ñèñòåìå (3) ñíà÷àëà âû÷èñëÿåòñÿ ðåøåíèå h = (h1, . . . , hM1)T, à ïîòîì îñòàëüíàÿ ÷àñòü s ïîëíîãî ðåøåíèÿ X~ = (s, h)T íàõîäèòñÿ ïàðàëëåëüíî ïî ôîðìóëå

sk=zk−Zkhk1−Zkhk, k= 1, . . . , M, (4) ãäå h0 =hM = 0.

 íàøåé ñòàòüå, äëÿ ðåøåíèÿ ñèñòåìû ñ ñóïåð-áëîêîì Sk è äëÿ âû÷èñëåíèÿ ÷àñòè ðå- øåíèÿhìû èñïîëüçóåì ìåòîä ìàòðè÷íîé ïðîãîíêè, õîòÿ äëÿ ýòèõ îïåðàöèé ìîæíî èñïîëü- çîâàòü ìåòîä ñòðåëîâèäíîé äåêîìïîçèöèè âëîæåííî.

448

(3)

Ðèñ. 1. Ïðèìåð ïðåîáðàçîâàíèÿ ÑËÀÓ â ìåòîäå ñòðåëîâèäíîé äåêîìïîçèöèè. Âåðõíèé ðèñóíîê:

èñõîäíàÿ ÑËÀÓ ñ âûäåëåíèåì áëîêîâ, êîòîðûå áóäóò ïåðåñòàâëåíû. Ñðåäíèé ðèñóíîê: ïðåîáðàçî- âàííàÿ ÑËÀÓ ñ ìàòðèöåé ñòðåëîâèäíîé ôîðìû. Íèæíèé ðèñóíîê: îáîçíà÷åíèÿ ýëåìåíòîâ ïðåîáðà- çîâàííîé ÑËÀÓ ñ ìàòðèöåé ñòðåëîâèäíîé ôîðìû, ïðèíÿòûå â äàííîé ñòàòüå.

449

(4)

3. Âû÷èñëèòåëüíîå óñêîðåíèå

Âû÷èñëèòåëüíîå óñêîðåíèå ìåòîäà ñòðåëîâèäíîé äåêîìïîçèöèè ïî îòíîøåíèþ ê ïîñëå- äîâàòåëüíîìó ìåòîäó ìàòðè÷íîé ïðîãîíêè ìîæíî îöåíèòü êàê îòíîøåíèå âðåìåíè âû÷èñëå- íèÿ ðåøåíèÿ ÑËÀÓ ìåòîäîì ïðîãîíêè ê ñîîòâåòñòâóþùåìó âðåìåíè ìåòîäà äåêîìïîçèöèè.

Âðåìÿ âû÷èñëåíèÿ íåïîñðåäñòâåííî ñâÿçàíî ñ ÷èñëîì ïîñëåäîâàòåëüíûõ îïåðàöèé êàæäîãî ìåòîäà. Ìû ñîñ÷èòàëè êîëè÷åñòâî ïîñëåäîâàòåëüíûõ ìóëüòèïëèêàòèâíûõ îïåðàöèé ìåòîäà ñòðåëîâèäíîé äåêîìïîçèöèè è, êàê ñëåäñòâèå, îöåíèëè âû÷èñëèòåëüíîå óñêîðåíèå. Àääè- òèâíûå îïåðàöèè íå ó÷èòûâàëèñü, ñ÷èòàÿ, ÷òî îíè òðåáóþò çíà÷èòåëüíî ìåíüøåãî êîìïüþ- òåðíîãî âðåìåíè.

 ñîîòâåòñòâèè ñ [11], ìû ñ÷èòàåì, ÷òî êàê äëÿ óìíîæåíèÿ, òàê è äëÿ îáðàùåíèÿ áëîêà ðàçìåðàn×nòðåáóåòñÿ ðîâíî n3 ìóëüòèïëèêàòèâíûõ îïåðàöèé. Äàííàÿ îöåíêà ñîãëàñóåò- ñÿ ñ îöåíêîé êîëè÷åñòâà îïåðàöèé â îáùåïðèíÿòîé ðåàëèçàöèè äàííûõ àëãîðèòìîâ [12], à èìåííî LAPACK. Òîãäà êîëè÷åñòâî ìóëüòèïëèêàòèâíûõ îïåðàöèé äëÿ ïîñëåäîâàòåëüíîãî àëãîðèòìà ìàòðè÷íîé ïðîãîíêè

OM S = (3N 2)(n3+n2l).

Äëÿ ìåòîäà äåêîìïîçèöèè ïîëíîå êîëè÷åñòâî îïåðàöèé óìíîæåíèÿ äàåòñÿ ôîðìóëîé

(4N11)n3+ (4N11)n2l

+

(6NM 3)n3+ (4NM 1)n2l

+ +

MX1 k=2

7Nkn3+ 5Nkn2l

+ (3M5)(n3+n2l),

(5)

ãäå Nk,k= 1, . . . , M îáîçíà÷àåò êîëè÷åñòâî áëîêîâ íà äèàãîíàëè ñóïåð-áëîêà Sk.  îáùåì ñëó÷àå Nk ìîãóò áûòü ðàçëè÷íûìè, íî óäîâëåòâîðÿþò ñîîòíîøåíèþ

XM k=1

Nk =N (M 1).

Ìàêñèìàëüíîå óñêîðåíèå äîñòèãàåòñÿ, êîãäà âðåìÿ ðàáîòû êàæäîãî ïðîöåññîðà ïî ðåøå- íèþ ñèñòåìû ñ ñóïåð-áëîêîì Sk îäèíàêîâî.  ýòîì ñëó÷àå Nk ðàâíû äðóã äðóãó ïðè k = 2, . . . , M 1, à N1 è NM íåñêîëüêî áîëüøå îñòàëüíûõ. Âûèãðûø â ïðîèçâîäèòåëüíîñòè ïî ñðàâíåíèþ ñî ñëó÷àåì, êîãäà âñå Nk, k = 1, . . . , M ðàâíû äðóã äðóãó äîñòàòî÷íî ìàë.

Ïîýòîìó, äëÿ ïðîñòîòû, ìû ðàññìîòðèì ñëó÷àé ðàâåíñòâà âñåõ Nk äðóã äðóãó.  ýòîì ñëó-

÷àå, åñëè êîëè÷åñòâî ïàðàëëåëüíûõ ïðîöåññîðîâ ðàâíî ÷èñëó áëîêîâ M íà äèàãîíàëè, òî ìàêñèìàëüíîå ÷èñëî ïîñëåäîâàòåëüíûõ îïåðàöèé ìåòîäà ñòðåëîâèäíîé äåêîìïîçèöèè áóäåò

OAD= ((N −M+ 1)/M) (7n3+ 5n2l) + (3M−5)(n3+n2l)

Òîãäà âû÷èñëèòåëüíîå óñêîðåíèå, êàê îòíîøåíèå S =OM S/OAD, ëåãêî ïîëó÷àåòñÿ:

S = 3N 2

3M5 +

5 +1+l/n2

NM+1 M

. (6)

Ëåãêî âèäåòü, ÷òî çàâèñèìîñòü âû÷èñëèòåëüíîãî óñêîðåíèÿ îò îòíîøåíèÿ l/n äîñòàòî÷íî ñëàáàÿ. Êàê ôóíêöèÿ M, óñêîðåíèå âåäåò ñåáÿ ëèíåéíî S = 3M/7 ïðè M N. Ãðàôè- êè âû÷èñëèòåëüíîãî óñêîðåíèÿ ïðè l = 1 è ðàçëè÷íûõ N è n ïðèâåäåíû íà ðèñóíêå 2.

Ïðè äàëüíåéøåì óâåëè÷åíèè M, óñêîðåíèå äîñòèãàåò ñâîåãî ìàêñèìóìà è ïîòîì óáûâàåò.

×èñëî ïàðàëëåëüíûõ ïðîöåññîðîâ, ïðè êîòîðîì âû÷èñëèòåëüíîå óñêîðåíèå äîñòèãàåò ñâîåãî ìàêñèìóìà äàåòñÿ ôîðìóëîé

M= s

(N + 1) 3

5 + 2 1 +l/n

. (7)

450

(5)

Òàêèì îáðàçîì, ïðè èñïîëüçîâàíèè ìåòîäà ñòðåëîâèäíîé äåêîìïîçèöèè âñå ñóïåð-áëîêè íà äèàãîíàëè ìîæíî âûáèðàòü îäèíàêîâîãî ðàçìåðà, ò.ê. óìåíüøåíèå óñêîðåíèÿ â ýòîì ñëó÷àå íåçíà÷èòåëüíî. Îïòèìàëüíîå êîëè÷åñòâî ïàðàëëåëüíûõ ïðîöåññîðîâ M äëÿ ôèêñèðîâàí- íîãî êîëè÷åñòâà áëîêîâ N íà äèàãîíàëè ÑËÀÓ äàåòñÿ ôîðìóëîé (7).

4. Ïðàêòè÷åñêèå ðåçóëüòàòû

Àíàëèòè÷åñêèå îöåíêè, ïðèâåäåííûå â ïðåäûäóùåé ãëàâå, áûëè ïðîâåðåíû â ïðàêòè-

÷åñêèõ ðàñ÷åòàõ. Ìåòîä ñòðåëîâèäíîé äåêîìïîçèöèè áûë ðåàëèçîâàí íà ÿçûêå C/C++ ñ èñïîëüçîâàíèåì âûçîâîâ ïîäïðîãðàìì äëÿ ðàáîòû ñ ìàòðèöàìè áèáëèîòåêè LAPACK 3.5.0.

Ïàðàëëåëèçì áûë ðåàëèçîâàí ñ ïîìîùüþ òåõíîëîãèè OpenMP. ×èñëåííûå ýêñïåðèìåíòû ïðîâîäèëèñü íà âû÷èñëèòåëüíîé ñèñòåìå ñ îáùåé ïàìÿòüþ (SMP) è 64 ïðîöåññîðàìè Intel Xeon.

2 4 6 8 10 12 14

0 10 20 30 40 50 60 70 80

Computational speedup (S)

Number of parallel processors (M) N=1535

Analytical estimation Practical result, n=300

Practical result, n=500 2

4 6 8 10 12 14 16 18 20

0 20 40 60 80 100 120

Computational speedup (S)

Number of parallel processors (M) N=3071

Analytical estimation Practical result, n=100 Practical result, n=400

Ðèñ. 2. Àíàëèòè÷åñêàÿ îöåíêà âû÷èñëèòåëüíîãî óñêîðåíèÿ ìåòîäà ñòðåëîâèäíîé äåêîìïîçèöèè ïî îòíîøåíèþ ê ìåòîäó ìàòðè÷íîé ïðîãîíêè (ñïëîøíàÿ êðèâàÿ) è ïðàêòè÷åñêèå ðåçóëüòàòû èçìåðåíèÿ óñêîðåíèÿ (êâàäðàòèêè è êðóæî÷êè) â çàâèñèìîñòè îò ÷èñëà ïàðàëëåëüíûõ ïðîöåññîðîâ. Ðàçìåðû áëîêîâ n = 100500, ÷èñëî áëîêîâ íà äèàãîíàëè N = 1535 (ëåâûé ãðàôèê), N = 3071 (ïðàâûé ãðàôèê), èñïîëüçóåòñÿ îäèí âåêòîð â ïðàâîé ÷àñòè.

 êà÷åñòâå òåñòîâîé çàäà÷è ðåøàëàñü ÑËÀÓ, ïîëó÷åííàÿ èç äèñêðåòèçàöèè ãðàíè÷íîé çàäà÷è äëÿ èíòåãðî-äèôôåðåíöèàëüíîãî óðàâíåíèÿ Ôàääååâà [13]. Äëÿ ÷èñòîòû ýêñïåðè- ìåíòà, íóëåâûå ýëåìåíòû áëîêîâ çàìåíÿëèñü îòíîñèòåëüíî ìàëûìè ñëó÷àéíûìè ÷èñëàìè.

Ïðàâàÿ ÷àñòü ñîñòîÿëà èç îäíîãî âåêòîðà, ò.å.l= 1.

Ïîëó÷åííûå ðåçóëüòàòû äëÿ N = 1535, 3071 è n= 100500ïîêàçàíû íà ðèñóíêå 2.

Âèäíî, ÷òî ïðàêòè÷åñêèå ðåçóëüòàòû õîðîøî ñîãëàñóþòñÿ ñ àíàëèòè÷åñêèìè îöåíêàìè. Ïðè M N íàáëþäàåòñÿ ëèíåéíîå óñêîðåíèå è îáùèé òðåíä ïîëó÷åííîãî âû÷èñëèòåëüíîãî óñêîðåíèÿ òàêæå ñîîòâåòñòâóåò àíàëèòè÷åñêèì ðåçóëüòàòàì. Íåáîëüøèå îòêëîíåíèÿ ìîãóò áûòü âûçâàíû íåèäåàëüíîñòüþ ðàáîòû ñ ïàìÿòüþ, à òàêæå ñèñòåìíûìè ïðîöåññàìè, âûïîë- íÿþùèìèñÿ ïàðàëëåëüíî. Ïîñëåäíåå áîëüøå âñåãî ïðîÿâëÿåòñÿ ïðèM = 64. Òåì íå ìåíåå, àíàëèòè÷åñêèå îöåíêè äëÿ äàííîãî ñëó÷àÿì â öåëîì âåðíû è ñëàáî çàâèñÿò îò n.

Ìåòîä ñòðåëîâèäíîé äåêîìïîçèöèè èñïîëüçîâàëñÿ äëÿ ÷èñëåííîãî ðåøåíèÿ áëî÷íî- òðåõäèàãîíàëüíîé ÑËÀÓ, ïîëó÷åííîé èç ãðàíè÷íîé çàäà÷è â ïîëÿðíûõ êîîðäèíàòàõ{ρ, θ}

äëÿ èíòåãðî-äèôôåðåíöèàëüíîãî óðàâíåíèÿ Ôàääååâà. Áëî÷íî-òðåõäèàãîíàëüíàÿ ìàòðèöà âîçíèêàåò èç êîíå÷íî-ðàçíîñòíîé àïïðîêñèìàöèè âòîðîé ïðîèçâîäíîé ïî ïåðåìåííîé ρ è ðàçëîæåíèÿ èñêîìîãî ðåøåíèÿ ãðàíè÷íîé çàäà÷è ïî áàçèñó Ýðìèòîâûõ êóáè÷åñêèõ ñïëàé- íîâ ïî ïåðåìåííîé θ [13]. Èç ðåøåíèÿ ãðàíè÷íîé çàäà÷è îïðåäåëÿëèñü àìïëèòóäû ðàññåÿ- íèÿ, ñîîòâåòñòâóþùèå çàäà÷å íåéòðîí-äåéòðîííîãî ðàññåÿíèÿ. Ïðèìåíåíèå äàííîãî ìåòîäà ïîçâîëèëî óìåíüøèòü îáùåå âðåìÿ ðåøåíèÿ, âêëþ÷àþùåå ôîðìèðîâàíèå ÑËÀÓ, ðåøåíèå451

(6)

ñèñòåìû, ïîëó÷åíèå àìïëèòóä ðàññåÿíèÿ, â 7-10 ðàç.

5. Çàêëþ÷åíèå

 ðàáîòå ïðåäñòàâëåí ìåòîä ñòðåëîâèäíîé äåêîìïîçèöèè äëÿ áëî÷íî-òðåõäèàãîíàëüíîé ÑËÀÓ. Ïîêàçàíî, ÷òî äàííûé ìåòîä äîïóñêàåò ýôôåêòèâíîå ïàðàëëåëüíîå ðåøåíèå ÑËÀÓ äàííîãî òèïà. Àíàëèòè÷åñêè ïîëó÷åíî âû÷èñëèòåëüíîå óñêîðåíèå ìåòîäà ïî îòíîøåíèþ ê ìåòîäó ìàòðè÷íîé ïðîãîíêè. Ðåàëèçîâàííûé ìåòîä ñòðåëîâèäíîé äåêîìïîçèöèè ïîçâî- ëèë íàì ïðîâåñòè âû÷èñëèòåëüíûå ýêñïåðèìåíòû è ïîäòâåðäèòü àíàëèòè÷åñêèå îöåíêè íà ïðàêòèêå. Èñïîëüçîâàíèå ìåòîäà ñòðåëîâèäíîé äåêîìïîçèöèè ïîçâîëèëî óìåíüøèòü âðå- ìÿ ðåøåíèÿ ÑËÀÓ â íåñêîëüêî äåñÿòêîâ ðàç, ÷òî ïðèâåëî ê ñîêðàùåíèþ îáùåãî âðåìåíè âû÷èñëåíèÿ èñêîìûõ âåëè÷èí ïðèêëàäíîé çàäà÷è â ñåìü-äåñÿòü ðàç.

Ëèòåðàòóðà

1. Ñàìàðñêèé À.À, Íèêîëàåâ Å.Ñ. Ìåòîäû ðåøåíèÿ ñåòî÷íûõ óðàâíåíèé, Ì.: Íàóêà, 1978.

2. Hockney R.W. The potential calculation and some applications // Meth. Comput. Phys.

1970. Vol. 9. P. 136211.

3. Wang H.H. A parallel method for tridiagonal equations // ACM Trans. Math. Software.

1981. Vol. 7. P. 170183.

4. Âîëîõîâà À.Â., Çåìëÿíàÿ Å.Â., Ðèõâèöêèé Â.Ñ. Ïàðàëëåëüíàÿ îïòèìèçàöèÿ ìåòîäà ðåøåíèÿ ñèñòåìû óðàâíåíèé ïîëÿðîíà ñ èñïîëüçîâàíèåì àëãîðèòìà ðàçáèåíèé //

Âû÷èñëèòåëüíûå ìåòîäû è ïðîãðàììèðîâàíèå. 2015. Ò. 16. C. 281289.

5. Johnsson S.L. Solving narrow banded systems on ensemble architectures // ACM Trans.

Math. Software. 1985. Vol. 11. P. 271288.

6. Ortega J.M. Introduction to parallel and vector solution of linear systems, New York: Springer, 1988.

7. Àêèìîâà Å.Í., Áåëîóñîâ Ä.Â., Ìèñèëîâ Â.Å. Àëãîðèòìû ðåøåíèÿ îáðàòíûõ

ãåîôèçè÷åñêèõ çàäà÷ íà ìíîãîïðîöåññîðíûõ âû÷èñëèòåëüíûõ ñèñòåìàõ // Ñèá. æóðí.

âû÷èñë. ìàòåì. 2013. Ò. 16. C. 107121.

8. Cleary A., Dongarra J.J. Implementation in ScaLAPACK of divide-and-conquer algorithms for banded and tridiagonal linear systems // Report No. UT-CS-97-358 (LAWN No. 125), University of Tennessee, 1997.

9. Belov P.A., Nugumanov E.R., Yakovlev S.L. Computing binary scattering and breakup in three-body system // Nuclear Theory in the Supercomputing Era, 2012, Khabarovsk, Russia. Proceedings. Pacic National University. 2013. P. 121134

10. Faddeev L.D., Merkuriev S.P. Quantum scattering theory for several particle systems, Dordrecht: Springer, 1993.

11. Ñàìàðñêèé À.À., Ãóëèí À.Â. ×èñëåííûå ìåòîäû, Ì.: Íàóêà, 1989.

12. Dongarra J.J., et al. LINPACK Users' Guide, Philadelphia:SIAM, 1979.

13. Belov P.A., Yakovlev S.L. Asymptotic method for determining the amplitude for

three-particle breakup: Neutron-deuteron scattering // Phys. Atom. Nucl. 2013. Vol. 76.

P. 126138. 452

(7)

Arrowhead decomposition for a block-tridiagonal system of linear equations

Pavel Belov, Eduard Nugumanov and Sergey Yakovlev

Keywords: arrowhead decomposition method, block-tridiagonal matrix, system of linear

equations, parallel solution, computational speedup, matrix sweeping algorithm

The arrowhead decomposition method which allows efficient parallel solution of a block-

tridiagonal system of linear equations is presented. The computational speedup with respect to

the matrix sweeping algorithm is analytically estimated by taking into account the number of

elementary operations of multiplication for the parallel and serial parts of the decomposition

method. It is shown that the maximal speedup is achieved for the finite number of parallel

processors. For a given size of the initial system of linear equations, the parameters of the

computational system which give the maximal speedup are obtained. Computational

experiments confirm the analytical estimations of the computational speedup.

Références

Documents relatifs

A trust level is a useful method to identify the required hardware and software security protection mechanisms that a system must include to protect the data confiden-

Key Words: Boussinesq-type equations, finite differences scheme, transparent boundary condi- tions, domain decomposition, interface conditions, Schwarz alternating method.. AMS

Àëãîðèòì ALT ñîñòîèò èç äâóõ ôàç: íà ïåðâîé ôàçå âûïîëíÿåòñÿ ðàññòàíîâêà îðèåíòèðîâ è âû÷èñëå- íèå ïîòåíöèàëüíûõ ôóíêöèé, íà âòîðîé ôàçå îñóùåñòâëÿåòñÿ ïîèñê

ALBERA, “On the behavior of current second order blind source separation methods for first and second order cyclostationary sources — Ap- plication to CPFSK sources,” in ICASSP 02,

L’id´ ee est de d´ ecomposer la fonction de distribution f en une Maxwellienne adapt´ ee au bord (qui n’est pas la Maxwellienne usuelle associ´ ee ` a f) et une partie cin´

[r]

Output Feedback Control of a cascade system of linear Korteweg-de Vries equations.. Constantinos Kitsos, Eduardo Cerpa, Gildas Besancon,

The final Section 7 is not closely related to the rest of the paper. It describes a construction of a sequence of integral polytopes associated to a symmetric integral matrix