Method of Finding the Remainder
VladimirVoronkin
vl.voronkinraxperi.om
Andrey Malikov
malikovnstu.ru
Elmira Azarova
elmira.azarovamail.ru
AlekseyShhegolev
a.wegolevgmail.om
North-CauasusFederalUniversity,
Institute ofInformationTehnologiesandTeleommuniations,
RussialFederation
Abstrat
Thispaperdesribesanewmethodofthetree-likestruturerepresen-
tationindatabases.Thedevelopedmethodofndingrelationshipsbe-
tweennodesisbasedontheuniqueprimefatorizationtheoremwhih
statesthateverynaturalnumberanbewrittenasaprodutofprime
numbers. ThedevelopedmethodisimplementedusingNVIDIAgraph-
isproessingunitparallelomputingandNVIDIACUDAtehnology.
Thepaperanalyzestheproessoftreeonstrutionwiththedeveloped
methodapplied, representsthealgorithmsofGPU-basedproessingof
treesodedusingthedevelopedmethod,andalsotheresultsofperfor-
manesnapshot.
1 Introdution
Computersystemsaredevelopingrapidly,proessorarhiteturesarehanging,themainanddisk memorysizes
are inreasing. Nowadaysthe rapid inrease of powerand data volumes allows for the development of high-
performanedatamanagementandproessingsystems.
Alargeamountofdataintheworldin oneformoranotherisurrentlyrepresentedasahierarhyormutual
dependenywheresomepiees ofinformationdepend ontheothers.
What ismore,datamanagementsystemsarebeginningto migratefrom thedisk-orientedtehnologyto the
memory-oriented. The development of DBMS has led to integrating in-memory tehnology support by many
databasemanagementsystems. [GBE13℄
Data proessing in themain memory enablesaonsiderableinreaseof thedata proessing ratein favorof
safety. ModernDBMS, supportingin-memorytehnologyproessdatainthememorymirroringhangesto the
disk.
Theappliationofmemory-orienteddataproessingtehnologiestogetherwiththenewmethodsofrepresent-
ingtreesusingGPUomputinganinreasethedataproessingrate.
This researh aims to develop a new method of hierarhy-representation in databases that exhibits high
performaneintreemanipulation operations.
Copyright
bythepaper'sauthors. Copyingpermittedforprivateandaademipurposes.In: A. Editor, B. Coeditor (eds.): Proeedings of the XYZ Workshop, Loation, Country, DD-MMM-YYYY, published at
http://eur-ws.org
Copyright c 2017 by the paper’s authors. Copying permitted for private and academic purposes.
In: S. H¨ olldobler, A. Malikov, C. Wernhard (eds.): YSIP2 – Proceedings of the Second Young Scientist’s International Workshop
on Trends in Information Processing, Dombai, Russian Federation, May 16–20, 2017, published at http://ceur-ws.org.
Thisworkisdevotedtothedesriptionofthedevelopedmethodofrepresentingtreesindatabases. Thedevelop-
mentofthemethodwasinitially-orientedtowardsthemaximumparallelizationoftreeproessingandmaximum
independene ofnodeodefromothernodeodes. Thepaperfousesonaradiallynewmethodofrepresenting
trees inrelationalandnonrelationaldatabases,andalso demonstratesthattheGPU omputingofhierarhial
relationshipsanbemoreeetivethantheCPU.Thedevelopedmethodofrepresentingtreesisorientedtowards
theparallelproessingmodewiththeuseofaGPU.
2.1 Methods of RepresentingTrees
All themethodsofrepresentingtreesin databasesanbeput intotwoategories:
methodsofrepresentingtreesin non-graphdatabases;
methodsofrepresentingtreesin graphandhierarhialdatabases.
Thefollowingmethodsbelongtotherstategory:
Theadjaenylistmethod ofrelationalhierarhialmodeling.
Theadjaenylistmethodisbasedonthestorageofdiretparent-hildrelationships. Eahhildinatreelike
hierarhyrelatestoitsparentwhihisatonelevelhigher. Usingthishierarhyproperty,atable-levelrelational
dependene anbereated.
Thematerializedpathmethod.
Amaterializedpathisastringeldthatonsistsofelementnamesseparatedbyaseparatorharater[Tro03℄.
Parents'namesofanodeforwhihamaterializedpathisbuiltareused asthematerializedpathelements.
TheNested Setsmethod.
The Nested Set method implies that there are twoadditionalelds for the node desription: atree oding
using thealgorithmforthenestedsetsodingintroduedbyJoeCelko[Cel12℄[Cel℄[Cel04℄, isdoneasfollows:
movingdowntheleftsideofatree,it'sneessarytotravelthroughallthesubtrees beginningwiththeleftmost
totherightmostandassigneahnodeanautoinrementvalue. Whenmovingdownthetree,anautoinrement
valueisassignedtothevariableresponsiblefortheintervalstartingpoint(½leftbound); whenmovingup-to
thevariableresponsiblefortheintervalend(½rightbound).
Thenestedintervalsmethod.
The nestedintervalsmethod, introdued in [Tro05℄, is basedon thematerialized path oding using anite
ontinued fration
[q 1, q 2, . . .
, q n ]
, where [q 1, q 2, . . .
, q n ]
, where q 1, q 2, . . .
, q n are the stepsof a materialized
path. Rationalnumbersa/b
,wherea >= b >= 1
andGCD(a, b) = 1
areused astreeelementodes.
. . .
,q n ]
, where[q 1, q 2, . . .
, q n ]
, where q 1, q 2, . . .
, q n are the stepsof a materialized
path. Rationalnumbersa/b
,wherea >= b >= 1
andGCD(a, b) = 1
areused astreeelementodes.
. . .
,q n ]
, whereq 1, q 2, . . .
, q n are the stepsof a materialized
path. Rationalnumbersa/b
,wherea >= b >= 1
andGCD(a, b) = 1
areused astreeelementodes.
. . .
,q n are the stepsof a materialized
path. Rationalnumbersa/b
,wherea >= b >= 1
andGCD(a, b) = 1
areused astreeelementodes.
Asanexample,onsiderhowtheodeforamaterializedpath½
3.2.2
isreatedusingaontinuedfration:3 + 1
2 + 1 2 = 17 5
Totransform a materializedpath to arational expression the onvergene prinipleanbe used, while for
the inversetransformation the algorithm forthe gradual trunation should beused, that in ase of arational
expressionisthesameasEulid'salgorithmforndingGCD[Gai99℄.
Modiationsandombinationsoftheafore-namedmethods.
Therearealsomodiationsofthegivenmethodsthatanxtheinherentproblems[Kol09℄[vor14℄[MT11℄.
For example, in [MT11℄ the improvement of the nested intervals method is introdued through the interval
odingintheresiduenumbersystem(RNS).UsingRNSenablesreduingthelengthofnumbersusedbystoring
anumberasashort lengthremainder. Representingnode odesasnumbersexpressedin RNS enablesparallel
proessing of not only dierent tree nodes but also one and the samenode remainder. Thus, the maximum
parallelization of data proessing is inreased. In [vor14℄ the method of storing nestedintervals expressed in
RNS isintroduedthatsolvessomeproblemsofsortingnumbersexpressedin RNSandtheindexonstrution.
Suh methods as the MaterializedPath, nestedintervals and modiationsof thebothinlude the data on
hildrenand/orparentsinthekeywhileodingandanbeproessedinparallelwithsuienteetiveness.
Themethodsofrepresentingtreesingraphandhierarhialdatabasesarediulttolassifyasmostofgraph
databasesusetheirownformatsandstruturesofdatarepresentations.
Insomeasespointers areusedforlinking neighbornodes. Thepointersarebound tothedatathat appear
in ars. Initsstruturethishierarhymodelissimilartotheadjaenylists.
Other [Kup86℄[KV93℄graphdatabasesuseatablemodelforstoringrelationshipars.
AnotherformatofstoringgraphdataisXMLandXML-basedformats[MS11℄[BELP℄.
W ORKS Name:Larry
Page
W ORKS Person1
Company1
Person2
Company2
Person3 Sine1998
Sine2001
Sine2010
Name:Joshua
Bloh
Name:Google
Name:Orale
Name:Brian
Goetz WORKS
Traverserelation Traverserelation
a)Graphrepresentation
IndexlookuponPerson.id
Person
Worksin
Company
Id Name
PersonId
CompanyId Sinse 1
2
3
N
JoshuaBloh
BrianGoetz
...
LarryPage
1
2
3 1
2 1
IndexlookuponCompanyId
1998
2001
2010
Id Name
1
2
...
N Orale
...
...
Google IndexlookuponCompany.Name
b)Relationalrepresentation
Figure1: Representationtypesofdependentdata
Fig. 1band1a illustratethe aforesaid. Fig. 1billustrates arelationalrepresentation ofhierarhialdata in
DB, whileFig. 1aillustratesdatarepresentationintheformofagraph.
Theillustrationdataexemplifythestorageofhierarhydataindatabases.
One of the distintive features of graph databases is the base hosen for database objets representation,
relationsbetweenthedata,omplexobjetsandtheirattributes.
All graphdatabasesarebasedonamathematialdesriptionofagraphthatdeterminesgraphmanipulation
operations. A graph model depends on the base, for example: a direted or undireted graph, marked or
unmarkedgraphnodes,weightedandunweightedars.
SomemodelssuhasGOOD,GMOD,G-Log,Gramrepresentboththeshemeandentityasanameddigraph.
LDM is an exeption whose shemes are digraphs with leaves representingdata and internal nodes represent
strutured data. LDM onsists of atable with two-olumns,eah ofwhih assoiatesentities with data types
(primitive,tuple orset). Amoredetailed desriptionofgraphmodelsisrepresentedin [AG08℄.
2.2 GPU Data Proessing
Graphisproessingunitseemstobeanadvanedhighperformaneplatformforomputing. Thebasidistin-
tivefeatureenablingGPU'shighperformaneisitsarhiteturehavingagreatnumberofALU. TheGPUand
CPUarhiteturesareomparedin[PT11℄. Anumberofworks[BS10℄[HKOO11℄pointtotheadvantageofGPU
proessingofdataoverCPU.On average,thequerieswereproessed530 timesfasterin relationaldatabases.
Someworks[PSHL10℄[BMCN09℄showtheinreasedrateofsortingdatausingGPU.Asfarashierarhialand
graph strutures are onerned, the works [HN07℄ [HKOO11℄ [WO10℄ are worth mentioning. in whih GPU
graphomputationsaremadeandwhihdesribethespeedupofdataproessingusingGPUomparedtoCPU
(whenproessinggraphstrutures).
Hereafter,mentioningGPUomputingmeanstheusageofNVIDIACUDAtehnology.
CUDA (Compute Unied Devie Arhiteture) is a NVIDIA tehnology designed for the development of
appliationsformassiveparallelomputingunits. Due toagreatnumberofALU.GPUimplementsthethread
modelofomputation: thereareinputandoutputdatathatonsistofthesameelementswhihanbeproessed
independently. Theelementsproessingisperformedbythekernel.
A kernel is a funtion exeuted on GPU and alled from CPU. A kernelis omposed of agrid of multiple
threads groupedintoahierarhy.
The highestlevel grid orrespondsto the kerneland groupsall thekernelthreads. A grid isaone-or
two-dimensionalarrayofbloks. Eahblokisaone/two/three-dimensionalarrayofthreads. Furthermore,eah
blokisanentirelyindependentset ofinteratingthreads.
3 Priniples of Tree Constrution
By ½tree wewill understand a onnetedayli graphomposed of aset of nodes
n = (k, d)
, wherek
is thenodeode,
d
isdataassoiatedwiththenode:T = { n } = { (k, d) }
Letusdeneforanode
n
operationskey(n) = k
aesstothenodeode,data(n) = d
aesstodataassoiatedwiththenode.
Denote by the node identier aprime numberunique within node Ids. Let us introdue aset ofprimes
I
,representingprimesin inreasingorderof theirvalues:
I = s 1 , s 2 , . . . , s m , . . .
LetusallaprodutofthenodeparentidentiersbythenodeIdthe node ode.
Theodeofatreerootisthe rootidentier.
Letusintroduetheoperationofgettingthefollowingprimenumberfromaset pfprimes
I
:next() = s i , i = i + 1
Primesareneededfortreeonstrution,beausetreedeompositionshould beunique.
Theusageofoneandthesameidentierformorethanonenodeisimpossible.
A treeisformed asfollows:
1.Aprimeis hosenasarootnode.
2. Eah node added to thetree has aodeequal to theprodutof the parent ode by theidentier of the
nodeadded.
Eahtreenodeintheodeenodestheentirepathstartingfromtheroot. Thismethodofnodeodereation
issimilarto theMaterializedPathmethod.
4 Tree Proessing Operations
Letusshowadmissibletreeoperations. Letusintroduethefuntion ofndingtheremainderondividing
a
byb
:mod(a, b) = a − b · ⌊ a b ⌋
Thefollowingtreeoperationsareadmissible:
1.
add(T, d, n parent )
,wheren parent isparentnode, d
isdataaddedto thetreeoperationof addinganode
to thetree.
Preondition:
exists(T, n parent )
Realization:
T := T ∪ { n(key(n parent ) · next(), d) }
(1)Prediate
exists(T, n)
determinesifthenoden
existing inthetreeT
:exists(T, n) := ∃ n ∈ T
2.Operationofsearhinganodewiththeode
k
in thetreenode(T, k)
.node(T, k) := { x ∈ T | key(x) = k }
(2)3.Operationofsearhingnodeparentsin thetree
parents(T, k)
.parents(T, k) := { x ∈ T | mod(k, key(x)) = 0 ∧ k 6 = key(x) }
(3)4.Operationofsearhingadiretparent
parent(T, k)
.parent(T, k) := M AX(parents(T, k))
(4)5.Operationofsearhingasubtree
subtree(T, k)
.subtree(T, k) := { x ∈ T | mod(key(x), k) = 0 }
(5)6.
children(T, k)
operationofsearhingdiret hildren.children(T, k) := { x ∈ T | k = key(parent(T, key(x))) }
(6)7.Operationofsubtreeremoval
remove(T, k)
.T := { x ∈ T | mod(key(x), k) 6 = 0 }
(7)8.Operationoftransferringasubtreerooted
n old toanewparentn new move(T, n old , n new )
.
Preondition:
exists(T, n old ) ∧ exists(T, n new ) ∧ mod(key(n new ), (n old )) 6 = 0
Realization:
T := T ∪ { x ∈ subtree(T, key(n old )) |
| n(recalcKey(T, key(x), key(n old ), key(n new )), data(x)) }
(8)where
recalcKey(T, k, k old , k new )
isanoperationofnodeoderealulation:recalcKey(T, k, k old , k new ) := k
k old ∗ k new
Consideringthepossibilityofexeutingparalleloperationswithatreeasaset, thefollowingfeaturesshould
benoted:
1.ThepossibilityofparalleltreeproessingusingGPU.
Treeoperations(2,3,4,5,6)that donotmodifythetreehavenoneedforinterationwithothernodes. All
suh operationsanbeproessedin parallel orsimultaneously. Parallelization oftree manipulation operations
uptoonethreadpernodeispossible.
2.Theorderofreordsin treerepresentationinthememoryinuenesonlytheorderofreordsintheresult
set anddoesnotinuenethenumberofreordsin it(the resultset).
3.Theusageofprimes inatreeinthesequentialorder isoptional.
5 GPU Computation of Trees
Eah node representsapair: {node ode: pointer tonode properties}. Let usallthis pairareord. Sine the
nodepropertiesanrepresentasequeneofdynamially-sizedfreeelds,eahnodeodeisassoiatedwiththe
pointertopropertiesloationin theleratherthanthepropertiesthemselves.
Intheledataisstoredasasequeneofreords. Whendownloading,reordsaredividedasfollows: keysare
reordedinto GRAM,pointers totheproperties arereordedintoRAM. Downloadeddata(keysandpointers)
is stored asa sequene. For suh form of representation it is important that the order of reords in RAM is
stritlythesameasinGRAM.WhenswappingsomenodesinGRAM,itisneessarytoswapnodesinRAMin
asimilarway.
When exeuting aquery, say, alookup query, all the reordsin thetree are reviewed. Thequery resultis
omputedon-the-y,omputingthedesiredresultwhileexeutingthequery. Thusthesearh isnotnarrowed.
Tosavethetime oftheproessingresulttransfertoRAM,theGPUmakesabitmapof thequeryresult,in
whihthebitsetdenotesanodeorrespondingtothequery,theremoveddenotesanon-orrespondingnode,Fig.
2). Thebit mapis formedin kernels(a kernelisafuntion exeutedonthegraphisard)of queryexeution;
asaresult,onemapbit orrespondstoeahnode. Forexample,witha4-bytekeythesize ofdatatransmitted
throughthebus(betweenGPUandCPU)isdereasedby32times.
Toget ertain nodes in the resultingseletion, it is neessaryto nd set bits in the bit map. The ordinal
numberofasetbit isanumberof astorageellinRAM(andGRAM) in whih thedataaddressof thefound
node(thefoundnodeode)isstored.
Let us onsider theproess of exeutingsometree operations. Eah operationis exeutedfor eah node in
theset.
ThewholetreeproessingisrealizedonGPU.
Theresultofomplexqueryproessing,forexampleaommonparentsearh,isformedasaresultofuniting
bit mapsofeahqueryresult,thus avoidinglongoperationsofeahnodesearhing.
Forexample
commonP arents(A, k, r)
operationofsearhingommonparentsfortwonodeswiththeodesk
and
r
lookslikethis:commonP arents(A, k, r) := parents(A, k) \
parent(A, r).
Forndingommonparents,themostrational(withregardtothetimeofndingasolution)methodisexeuting
twoqueriesforparentsearhforeahnodeandunitingbitmapsofthequeriesresultsthroughthebitwiseAND
operator (Fig. 2). Uniting of bit maps as well asalulations take plae in GPU. The GPU exeution of the
bitwiseANDoperatorfora4-bytewholerequiresfourmultiproessoryleswithoutonsideringmemoryaess
delay(4 ylesforthesharedmemory).
6 The Experiment Results
Tohekthedevelopedmethod of representingtrees, theDBMS prototypewasdesigned supporting thebasi
operationsof data manipulation: insertion,deletion, transfer,searh. Searhand transferqueriesare exeuted
on GPU, insertion and deletion are exeuted onCPU. The developed DBMS prototype does not ahe data.
Reading isperformeddiretlyfrom thedisk. Queriesarenotahedeither. Theresultofsearhqueryisabit
map returnedtothelientintheform ofaursor. Atthenextursorreordquery,thesearhforthenextset
bit inthemap takesplae,thenreadingofthedoumentandkeyorrespondingto thebitloationinthemap.
The performane test of the developed method of representing trees was arried out in omparison with
DBMS MongoDB.MongoDBwashosenasthepopularandhighperformaneDBMS.Asanindex, MongoDB
usesB+tree. ForhierarhyrepresentationinMongoDBtheMaterializedPathwashosenasahierarhyreord
type.
6.1 Test Set
A tree-like databaseis used asaset test. As aDB base, theAll-Russian Classierof Addresses(KLADR) is
used. [kla℄
Datais representedasatreewith2.5 millionnodes. Eah nodehasadierentnumberofhildren. Thetree
isnotbalaned. Theheightofthetreeis5levels.
ForrunningexperimentstheKLADRreferenewasextended. Theoriginalrefereneguidehasunusedelds
andoupiesapproximately400MBofthediskspae.
EahtreenodehastheeldsrepresentedinTable1.
It is neessary to disriminate between a geographial division and a tree node to whih the geographial
division isattahed.
The elds represented in Table 1 are ommon elds for the both DBMSs in the test. MongoDB uses the
Materialized Path (eld ½matPath) to store thehierarhydata. In the eld ½matPath the DBMS prototype
storesthenodeodewhihistheprodutofparentsIdsbythenodeId.
ThedatasetsarethesameforthebothDBMSstested.
Tohektheperformaneofthemethodofhierarhyrepresentationin omparisontotheanalogues,aseries
of testswereonduted.
Nodename Datatype Desription
name har[32℄ nodename
matPath varhar[128℄ nodematerializedpath
objName har[16℄ nameofobjetassoiatedwithnode
objData varhar[10000℄ dataattahedtoobjet
gpsX oat objetoordinateXinGPS oordinatesystem
gpsY oat objetoordinateYinGPS oordinatesystem
level int objetnestinglevelinaordanewithKLADR
objType har[8℄ objettypeinaordanewithKLADR
objCode har[16℄ objetodeinaordanewithKLADR
A ColdStartup,withallthetestsbeingdone,wasperformedforeahDBMS½warm-up. Afterthat, allthe
tests werearriedout. BeforetestingofeahDBMStheomputerwasrebooted.
Eahtest wasrepeatedthree times. Thetestresultsrepresenttheaveragereeivedvalue.
Thepowersupplyplanwassetto½peakperformane andalsoallthepower-savingfuntionswereturnedo.
For eah DBMS the following tests were performed: reord addition, subtree seletion, seletion of all the
nodeparents,treetraversal,nodeodesearh.
Testing of node addition time wasperformed asfollows: the time of all the nodes addition wastaken into
onsideration. Afterthat,theresultwasdividedbythetotalnumberofnodes.
The seletionof all the node hildren assumesobtaininga ursor forthe node hildren, with reading them
from thedisk. 3500querieswereexeutedduringthistest.
Theseletionof alltheparentsalsoassumesobtainingaursor fortheresultset,with readingreordsfrom
thedisk. Similarly,3500querieswereexeutedduringthis test.
Treetraversalassumesreadingthewholetree.
The seletion of threads with reading assumes exeution of 10,000 queries. After obtaining the ursor for
hildren,allthehildrenarereadoutfromthedisk. ThistestallowsdeterminationoftherealDBMSperformane
asMongoDB,whenobtainingaursor,returnonlytherstresultingreords. Therestofthereordsarefound
in theindexonlywhenreadingallthepreviouslyfound.
Thetotalsizeofthedatabasetestedis24.4GBwithoutonsideringthenormativedouments.
Twodierent experimentswerearried outusingtheDB tested. Therstoneinludes thewhole database,
theseond oneinludes only100000 nodes(thenumberofqueriesand datasetsareleftunhanged). Theneed
for doingseveralexperimentswith onedataset isaused bytheneessityof determiningthedegreeof impat
of thereordsnumberin thetreeonitsproessingperformane.
Inadditiontothedatabasetested,anexperimentwasset upin whih 100000treenodeswereusedasatest
set amaterializedpath wasattahed to eah nodeas properties. Theperformane of this experimentenables
understanding thedegreeofimpatof thereordsizeonthetreerepresentationmethod performane.
6.2 Hardware and Software
A omputerwithIntel
R Corei5TM-4590proessorrunningWindows8.1operatingsystemwasused asaom- puting platform. Theproessorisa3.30GHz64bit quad-ore,withmaximumthroughputof32GB/se. Themahinehas8gigabytesofmemory. ThegraphisardusedisanNVIDIAGeForeGTX760. with1152CUDA
ores. 4GBofglobalmemory,andsupportsamaximumthroughputof192.2 GB/se.
CUDA 6.5 driver is used on the omputer. As for the software. MongoDB 2.6 is used. When testing,
exeutableDBMSleswereusedthatweredownloadedfromtheoialsite. ThedevelopedDBMSwasreated
usingMSVS Studio2013andoptimizationag O3.
6.3 Test Results
Table2representsthetestresults. Allthevaluesareaverageresultsforasetofqueries.
Theresultsin Table2onrmapossibilityofthejustiedappliation ofthedevelopedmethod ofhierarhy
representation in databases. Suh aresult is notied with largedata volumes and agreat number of reords.
If areordissmall, thismethod performane beomesinferiorto aommontree index(B+treeforMongoDB)
Query Representedmethod MongoDB
(averagetime,se/test)
Nodeaddition 768,44 14304,02
Subtreesearh 162,01 283,25
Parentsearh 126,08 163,00
Readingthewhole tree 490,74 8436,37
Obtainingarbitrarynodes 159,90 8880,27
Table3: Performaneomparisonoftestswithdierentdatasets
Database / test Node addition Subtree searh Parentsearh Node searh
2,5 millionreords,10KB/reord
queries/se
MongoDB 174,78 12,36 21,47 1,13
Developed method 3253,34 21,60 27,76 62,54
100000reords,10Kb/reord
queries/se
MongoDB 162,70 23,33 93,97 5,77
Developed method 3245,34 21,09 30,02 57,22
100000reords,20b/reord
queries/se
MongoDB 8673,03 6603,77 350000 2554,74
Developed method 62500 30,04 108,97 1988,64
(Table3). Thediereneisausedbytherestritiontothemaximumnumberofqueriesperseondonthepart
of hardware. Theminimum timeof ongurationand startupofthe emptykernel(fortheard tested) in the
synhronousmodeisapproximately0.10mse: onsequently. thegraphisardannotexeutemorethan1015
thousandkernelsperseond.
Figure 3showstherelationofthequeryproessingtime.
Disk
70%
GPU
9.9996%
CPU
20%
Datatransfer
0.0004%
a)Childrensearh
Disk
69%
GPU
15%
CPU
15%
Datatransfer
1%
b)Parentsearh
Figure 3: Timedistributionwhenexeutingqueries
Reading datafrom thedisk takesmostofthetime forthebothqueries. Theexeutionofkernelsrequiresa
littletime. Onaverage,thekernelexeutiontakes4.4msethatprovidesthemaximum227queriesperseond
without onsidering CPU operation time (for 10 KB/reord). When testing DB with a small reord size (20
byte/reord), the kernel exeution time is almost the sameas thetesting time for largereords. This results
in the method beingpredisposed to building hierarhies basedon nding the remainder, dealingwith a large
amountofdata,theCPUproessingofwhih isalong-runningoperation.
7.1 Threaded Exeution
So far the multithreaded software produt has not beendeveloped (we mean CPU multithreading, not GPU
multithreading). Multithreadedsoftwareisanimportantaspetof futuredevelopments. Ifaqueryisproessed
usingmultithreading,themaximumspeedupwill beahievedwhenexeutingdatamodiationqueries(reord
addition,transfer). TheappliationofNoSQL-stylenonlokingreordfuntionswillenableperformingdatabase
modiationoperationsintheasynhronousmode,thusimprovingtheperformane.
7.2 MultipleGPU ComputingSupport
Currentlyonegraphiardperformsallalulations. Additionofseveralgraphiardswillallowinreasingthe
terminal apaity of data bank orthe queryexeution speedthrough proessinghunks oneah graphi ard
whenstoringthesameditiondierentgraphiards.
Whenusingseveralgraphiards,thelow-speedCPU-GPUbustransferappearsto bearestrition. Despite
the fat that PCI 3.0 xl6bus hasa theoreti duel-sided exhange rate 128GB/se and 8 GT/s (GigaTrans-
ation/s), it is restrited by the ore memory performane. In atual pratie the bus reprodution speed is
approximately39GB/se(ontestedPC).
Whatismore,manymotherboardsespeiallyinthemoderateprierangehaveonlyonePCIslotper16data
transmissionlineswhereastheseond oneandthesueedinghavex8 andx4ingeneral.
Eventhoughusinglesslinesdereasesmulti-ardongurationperformaneby515%onaverage,andusing
multi-ard ongurationsin most asesannot reah its full potential due to the RAM restrited speed, the
benetfromusingsuhongurationsexeedstheexpenses.
7.3 Hardware Restritions
CurrentlyanimportantGPUrestritionislakofthread synhronizationresouresonthewhole graphiard.
Thread synhronization is possible only inside bloks. Blok synhronization on GPU is impossible. User
synhronizationmethodsarebasedonusingags. Hardwaresynhronizationofthebloksismoreimportantas
theperformaneinreasesomparedto thesoftwaresynhronization.
Another GPU nuane is SIMT arhiteture. SIMT arhiteture allows appliation of one and the same
ommand to dierent data. Thedrawbakof the arhiteture is that all the threads go througheah branh
of theode whenbranhing isusedevenifthethread doesnotexeuteanyinstrutions. Inotherwords,some
threads areinoperation,theothersareidle.
It is important to note a small numberof registersavailable for eah thread. This problem beomesaute
with alargenumberofthreads and aomplex ode(largedatatype). With 100,000threads,16 byte sizedata
type and 512threads perblok, eah blok uses60 registersin 63theoretially possible(the data taken from
theCUDAproler). Thisresultsinalessnumberofthreadsbeingexeutedinparallel. Forexample,whenthe
numberofregistersusedin athreaddroppedto13,theproessingspeedofthesameamountofdatainreased
byalmost10times.
Whenrunningtheexperiment,theredutioninthenumberofthreadsperblokdidnotresultintheexpeted
performaneimprovement.
7.4 Key Compression
The tree proessingmethod desribed in the artile assumesusing plural multiplying for dening key values.
Thekeysobtainedbyprimesmultipliationareexposedtoarapidinreaseresultinginlarge-sizenumbers. With
100thousandreordsin atreeitisneessarytousekeyswhosesize ismorethan128bit.
Themostfrequentarithmetioperationsperformedwithkeysinatreearedivision,multipliationandtaking
the divisionremainder. Hardwaresupport ofsuhoperationsisimpossibledue to CPU andGPU arhiteture
harateristis. Toperformarithmetioperationswithsuhkeys,itisimportanttousedierentalgorithmsthat
arelesseientthanhardwarerealizationofsuhoperations.
Usingnonpositionalnotationswillallowthesizeredutionoftheusednumbersbystoringanumberasasmall-
size remainder. When performing non-modular operations, the lak of need for onsidering the dependenies
betweentheremaindersopensupanopportunityfortherealization ofeientparallelism. Thistogetherwith
the advaned featuresof Keplerarhiteture(bit ontrolwithin warp),will allowtheeientrealization ofbit
mappinganddeningzeroremainderfornonpositionalnotations.
Thepaperrepresentsamethodofstoringtree-likestruturesindatabasesorientedtowardsmultithreadproessing
withregardtoCUDAparallelization.
ThisprojetdemonstratesusingGPUforhierarhialdataproessing.
Theresultof theartileasafollows: anewmethod ofrepresentingtreesin databasesis developed,and the
realizationof ahierarhialdatabaseprototypewithGPUdataproessing.
Onaverage,theimplementedsoftwareruns by19timesfasteromparedto thematerializedpathproessing
in DBMSMongoDBonCPU.Despitethequeryresultvariations,theminimumspeedupwas1.29times.
Referenes
[AG08℄ Renzo Angles and Claudio Gutierrez. Survey of graph database models. ACM Comput. Surv.,
40(1):1:11:39,2008.
[BELP℄ U. Brandes,M.Eiglsperger,J.Lerner,andC.Pih. Graphmarkuplanguage(graphml).
[BMCN09℄ Ranieri Baraglia, Via G Moruzzi, Gabriele Capannini, and Frano Maria Nardini. Sorting using
BItoni netwoRkwIthCUDA. LSDS-IRWorkshop,(July),2009.
[BS10℄ P. Bakkumand K. Skadron. Aelerating sql database operationson a gpu with uda: Extended
results. Tehnialreport, Universityof VirginiaDepartmentofComputerSiene, 2010.
[Cel℄ JoeCelko. Treesin SQL.
[Cel04℄ JoeCelko. HierarhialSQL,2004.
[Cel12℄ J.Celko. JoeCelko's TreesandHierarhiesin SQLfor Smarties. JoeCelko'sTreesandHierarhies
in SQLforSmarties.Elsevier/MorganKaufmann,2012.
[Gai99℄ A. T. Gainov. Number Theory. Part I. Resoure book. Novosibirsk State University. Faulty of
MehanisandMathematis.,1999.
[GBE13℄ MarelGrandpierre,GeorgBuss,andRalfEsser. In-MemoryComputingtehnology-Theholygrail
ofanalytis? Deloitte &Touhe GmbHWirtshaftsprüfungsgesellshaf, (07),2013.
[HKOO11℄ SungpakHong,SangKyun Kim,TayoOguntebi, andKunleOlukotun. AeleratingCUDAgraph
algorithms atmaximumwarp. Proeedings of the 16thACM symposiumon Priniples andpratie
of parallel programming- PPoPP'11,page267,2011.
[HN07℄ PawanHarishand P.J.Narayanan. Aeleratinglargegraphalgorithms onthegpu usinguda. In
Proeedings of the 14th International Confereneon High PerformaneComputing, HiPC'07,pages
197208,Berlin,Heidelberg,2007. Springer-Verlag.
[kla℄ KLADR-theAll-RussianClassierofAddresses.
[Kol09℄ M.A. Kolosovskiy. Data struture for representing a graph?: ombination of linked list and hash
table. CoRR, 2009.
[Kup86℄ Gabriel Mark Kuper. The Logial Data Model: A New Approah to Database Logi. PhD thesis,
Stanford, CA,USA,1986. UMIorder no.GAX86-08173.
[KV93℄ G. M.Kuperand M.Y. Vardi. Thelogialdata model. ACM Transations on DatabaseSystems,
18(3):379413,1993.
[MS11℄ AndersM
ø
llerandMihaelShwartzbah. Xmlgraphsinprogramanalysis. Si.Comput.Program., 76(6):492515,2011.[MT11℄ A. Malikov and A. Turyev. Nested intervals tree enoding with system of residual lasses. IJCA
SpeialIssueonEletronis, InformationandCommuniation Engineering,ICEICE(2):1921,2011.
basedonbitonisort bitonisort. InParallel Proessing andAppliedMathematis, pages403410.
2010.
[PT11℄ Jonathan PalaiosandJosh Triska. A Comparison of ModernGPU and CPU Arhitetures: And
theCommonConvergeneofBoth. pages120, 2011.
[Tro03℄ Vadim Tropashko. TreesinSQL: NestedSetsandMaterializedPath. 2003.
[Tro05℄ Vadim Tropashko. Nested intervalstreeenodingin sql. SIGMODReord,34(2),June2005.
[vor14℄ DBMSIndexforHierarhialDataUsingNestedIntervalsandResidueClasses.YoungSi.Int.Work.
TrendsInf.Proess., 2014.
[WO10℄ YangzihaoWang and JohnOwens. Large-sale graphproessingalgorithms onthegpu. Tehnial
ReportJanuary2011,UCDavis,2010.