2 Our approach - XIV Spanish Meeting on Computational Geometry

In CUDA, the parallelizable parts of an algorithm are executed by a collection of threads grouped into blocks of user defined size running in parallel. The code to be executed by each thread is written in a kernel where different types of memory can be used: registers (local memory of a thread), global (accessible to thread) and shared (accessible by every thread of a block). Atomic operations are used to operate on a memory position without allowing any other access to that memory position during the process.

After the initialization phase, we follow an iterative process that finishes when all input points are inserted into the Delaunay triangulation. In each iteration we insert as many points as possible with the condition that only one point can be inserted into one single triangle. Each iteration is divided into three steps: location, where the triangle containing every non inserted point is determined; insertion, where at most one point is inserted in a triangle; and swapping, where non Delaunay edges are swapped avoiding conflicts between them.

2.1 Initialization phase

In order to use as efficiently as possible the GPU’s resources the following data structures are used. Letnbe the number of vertices. Verticesis an array of sizen+ 3where each element contains a position(x, y) in 2D. Its first three positions corresponds to the three vertices of an auxiliary large triangle that contains all points. Triangles is a 2n+ 1 sized array of indices to Vertices where each three consecutive indices correspond to a triangle. Position zero of this array corresponds to the auxiliary triangle. The array Neighbourscontains indices to neighbors and future neighbors triangles of each triangle.

Each six consecutive indices are related to a triangle. The first three correspond to

XIV Spanish Meeting on Computational Geometry, 27–30 June 2011 171

neighbors of the triangle and the last three to future neighbors of the triangle before executing a swapping. Other arrays storing results of intermediate steps are needed to facilitate the general process. For each vertex, the Inserted array contains a flag to know whether the vertex has been inserted or not, and the ContainingTriangle array indicates which triangle contains the vertex. Initially, all the vertices are contained in the auxiliary triangle. For each triangle, theVertexToInsertand theEdgeToSwaparrays respectively record the vertex to be insert in the triangle and the edge of the triangle to be swapped. All arrays are allocated in the global memory.

2.2 Location step

This step updatesContainingTriangleandVertexToInsertby the use of a kernel. Each thread operates on a vertex of indexi. If vertex v=Vertices[i]is not inserted yet into the triangulation (Inserted[i] = 0) and it is not contained into its assigned trianglet= ContainingTriangle[i], a walking process is launched along directioncv, wherecis the centroid oft, until the trianglet⁰ containingvis reached. Ifvlies on an edge, it is assigned to the triangle with less index incident to the edge. Then, ContainingTriangle[i] is updated with t⁰ and VertexToInsert[t⁰] is updated with v by the use of an atomic operation. In this manner,VertexToInsert[t⁰]contains the first vertex arriving to t⁰. 2.3 Insertion step

This step inserts the vertices stored inVertexToInsert into the triangulation by the use of three kernels. Each thread of the kernels operates on a vertex of indexi. Let triangle t= ContainingTriangle[i]. If i= VertexToInsert[t], triangle t will be subdivided in three triangles of indicest,2i+ 1and 2i+ 2. The first kernel checks, for each neighbour t⁰ of t, if neigbour t oft⁰ will be2i+ 1or 2i+ 2after the subdivision of t. In that case, this information is stored in the future neighbours part of Neighbours[t⁰]. The second kernel effectively insertsiint by using its future neighbours.

2.4 Swapping step

This step swaps edges and is separated in three kernels that operate on a triangle of index t. The first kernel selects, if there exists, an edge oft to be swapped and stores it in EdgeToSwap. An edge is candidate to be swapped if it does not fulfill the Delaunay criterion and it is the only one edge of the adjacent neighbour triangle candidate to be swapped. Then three cases arise: (1) Only one edge can be swapped —then it is selected to be swapped iftis lower than the adjacent neighbour; (2) two edges are candidates to be swapped —then one of them is chosen for swapping; and (3) three edges are candidates to be swapped —then again one of them is chosen for swapping. If an edge has been selected, lett⁰ be the neighbouring triangle of t sharing the edge. Then, all triangles t⁰⁰ adjacent to the quadrilateral determined by t and t⁰ are updated in the following way.

If after the edge swapping, neighbour t of t⁰⁰ will be changed by t⁰ or viceversa, this information is stored in the future neighbours part ofNeighbours[t⁰⁰]. The second kernel effectively swaps the selected edge, while the third kernel updates the neighbours with the information stored in future neighbours. These three kernels are executed sequentially until no edge can be selected.

172 Parallel Delaunay triangulation based on Lawson’s incremental insertion

3 Results

The algorithm has been executed ten times on each of five different sets of random points.

Table 1 shows the mean running times (the input set of vertices is previously loaded in memory) and mean number of iterations for computing the Delaunay triangulations.

These results have been carried out on a computer equipped with an Intel(R) Pentium(R) D CPU 3.00GHz, 3.5GB RAM and a GPU NVidia GeForce GTX 580/PCI/SSE2 which has a cached global memory, reducing the access to global memory problems and time.

n 25600 52100 256000 521000 1000000

Mean time (s) 0.088 0.146 0.665 1.283 2.373

Mean iterations 18 19 21 23 24

Table 1. Behaviour of the proposed algorithm.

As it has been pointed out in the abstract, in this paper we present an ongoing re-search. Future work will consist in studying the behaviour of our approach on different point distributions and comparing its performance with the current parallel implementa-tions.

References

[1] C. D. Antonopoulos, F. Blagojevic, A. N. Chernikov, N. P. Chrisochoides and D. S. Nikolopoulos, Algorithm, software, and hardware optimizations for Delaunay mesh generation on simultaneous multithreaded architectures.J. Parallel Distrib. Comput.69(2009), 601–612.

[2] C. D. Antonopoulos, F. Blagojevic, A. N. Chernikov, N. P. Chrisochoides and D. S. Nikolopoulos, A multigrain Delaunay mesh generation method for multicore smt-based architectures.J. Parallel Distrib. Comput.69(2009), 589–600.

[3] G. E. Blelloch, G. L. Miller, J. C. Hardwick and D. Talmor, Design and implementation of a practical parallel Delaunay algorithm.Algorithmica24(1999), 243–269.

[4] M.-B. Chen, T.-R. Chuang and J.-J. Wu, Parallel divide-and-conquer scheme for 2D Delaunay triangulation: Research articles.Concurr. Comput. Pract. Exper.18(2006), 1595–1612.

[5] N. Chrisochoides and D. Nave, Parallel Delaunay mesh generation kernel.International Journal for Numerical Methods in Engineering58(2003), 161–176.

[6] N. Chrisochoides and F. Sukup, Task parallel implementation of the Bowyer–Watson algorithm, in: Proceedings of Fifth International Conference on Numerical Grid Generation in Computational Fluid Dynamics and Related Fields(1996), 773–782.

[7] P. Cignoni, C. Montani, R. Perego and R. Scopigno, Parallel 3D Delaunay triangulation.Computer Graphics Forum12(1993), 129–142.

[8] J. Kohout, I. Kolingerová and J. Zára, Parallel Delaunay triangulation inE² andE³ for computers with shared memory.Parallel Computing31(2005), 491–522.

[9] S. Lee, C.-I. Park and C.-M. Park, An improved parallel algorithm for Delaunay triangulation on distributed memory parallel computers, in: Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC’97), APDC’97, IEEE Computer Society, 131–138.

[10] G. Rong, T.-S. Tan, T.-T. Cao and Stephanus, Computing two-dimensional Delaunay triangulation using graphics hardware, in: Proceedings of the 2008 symposium on Interactive 3D graphics and games, I3D’08, ACM, 89–97.

[11] J. Valdés, Cálculo de la triangulación de Delaunay en la GPU, in: Actas Encuentros de Geometría Computacional (2009), 125–129.

XIV Spanish Meeting on Computational Geometry, 27–30 June 2011

Connecting red cells in a bichromatic Voronoi

Dans le document XIV Spanish Meeting on Computational Geometry (Page 180-183)