• Aucun résultat trouvé

MapGraph - Graphprocessing at 30 Billion Edges per Second on NVIDIA GPUs

N/A
N/A
Protected

Academic year: 2022

Partager "MapGraph - Graphprocessing at 30 Billion Edges per Second on NVIDIA GPUs"

Copied!
1
0
0

Texte intégral

(1)

MapGraph - Graph processing at 30 billion edges per second on NVIDIA GPUs

Zhisong Fu SYSTAP, LLC fuzhisong@systap.com

Harish Kumar Dasari University of Utah hdasari@sci.utah.edu

Bradley Bebee SYSTAP, LLC beebs@systap.com

Martin Berzins University of Utah mb@sci.utah.edu

Bryan Thompson SYSTAP, LLC bryan@systap.com (Presenting)

MapGraph is a disruptive technology that delivers extreme performance for graph problems on many-core hardware. MapGraph can be run on a laptop, on EC2 HPC GPU compute nodes, and on large GPU compute clusters. With processing speeds of up to 3 billion edges per second on a single GPU, MapGraph changes what is possible with your data.

Many-core computing is the future. CPU architectures are not getting any faster. Continued performance gains must come from many-core technologies such as GPUs or the Intel Xeon Phi. GPUs are widely known for their role in games, high-performance computing, and high FLOPS/watt ratio. However, graph algorithms are data-intensive, not compute intensive, and have degree dependent parallelism. As a consequence, graph algorithms place an extreme burden on the memory bus and communications network.

SYSTAP and the Scientific Computing and Imaging Institute have developed a capability for extreme performance parallel graph algorithms on GPUs from laptops to large GPU

clusters. MapGraph provides a scalable technology for data-intensive workloads that addresses the data-dependent parallelism, memory, and communication bottlenecks. I will review this research and present our roadmap for this technology.

Learn more at http://mapgraph.io.

This work was (partially) funded by the DARPA XDATA program under AFRL Contract #FA8750- 13-C-0002. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. D14PC00029. The authors would like to thank Dr.

White, NVIDIA, and the MVAPICH group at Ohio State University for their support of this work.

Références

Documents relatifs

With the growing interest in GPUs, we try to answer the question of what it takes to have a medical imaging application available as a web service and executed on

This means that the number is not divisible by any prime number of the list (recall that if a number is not prime, then the smallest prime factor is at most the square root of

instead of Cutting Plane Method (CPM) may obtain more quick convergence speed, it becomes less effective when dealing with the large-scale training data sets; on the other hand,

It is thus possible that NVIDIA would use its driver to poll the voltage controller’s power reading and decide to reclock the card to a lower performance level.. Since GPU-Z

– We utilized a memory access pattern in a warp using shared memory The fastest implementation of the algorithm when using a GPUs is from LVS + D + SH mapping when the number of

The first approach has two limitations: (i) in the mobile scenario we tar- get the area of relevance of filters changes frequently and this would require a frequent update of

For the elevator smoke control systems envisioned in buildings, the leakage area from the shaft to the lobby is much greater than that from the lobby to the building. This means

Loop mapping methodology During the previous static code analysis, we have distinguished parallel loops from sequential ones.. Now, we need to identify loop nests that match with