• Aucun résultat trouvé

Adaptive Cache-Oblivious All-to-All Operation

N/A
N/A
Protected

Academic year: 2021

Partager "Adaptive Cache-Oblivious All-to-All Operation"

Copied!
1
0
0

Texte intégral

(1)

Adaptive Cache-Oblivious All-to-All Operation

Shin Yee Chung and Wen-Jing Hsu

Division of Computer Science, Nanyang Technological University, Singapore-639798

Abstract—Modern processors rely on cache memories to

reduce the latency of data accesses. Extensive cache misses would thus compromise the usefulness of the scheme. Cache-aware algorithms make use of the knowledge about the cache, such as the cache line size, L, and cache size, Z, to be cache efficient. However, careful tuning of these parameters for these algorithms is needed for different hardware platforms. Cache-oblivious (CO) algorithms were first introduced by Leiserson to work without the knowledge of the cache parameters mentioned earlier, but still achieve optimal work complexity and optimal cache complexity. Here we present CO algorithms for all-to-all operations (analogous to the cross-product operation). Its applications include Convolution, Polynomial Arithmetic, Multiple

Sequence Alignment, N-Body Simulation, etc. Given two lists

each with n elements, a naive implementation of all-to-all operation incurs O(n2/L) cache misses. Our CO version

incurs only O(n2/L2 √Z ) cache misses. Preliminary

experiments on Opteron 1.4GHz and MIPS 250MHz show that the CO implementation achieves two times faster. The profiling tool further confirms that the amount of cache misses is significantly lower. We also consider various situations where (a) the elements have non-uniform sizes, (b) an element cannot fit into the cache, (c) the lengths of the lists vary, and (d) an element is linked list. In addition, we study the extension to K-lists All-to-All Operation and its application. Finally, we will present the empirical results and compare with cache-aware algorithms.

Références

Documents relatifs

In this work, we study the following problem: given a set of parallel applications, a multi-core processor with a shared last-level cache LLC, how can we best partition the LLC

1998 ACM Subject Classification Algorithms and data structures, E.1 Data Structures Keywords and phrases working-set property, dictionary, implicit, cache-oblivious,

To summarize, all heuristics based on dominant partitions are very efficient, especially when compared to the classical heuristics Fair (which share the cache fairly

Retrouve les objets dans l’image et

Retrouve les objets dans l’image et

Retrouve les objets dans l’image et

Retrouve les objets dans l’image et

The isotherms and pore size distributions between tension wood sections S0 and the isolated G-layer extracted by 15 min of sonication G15 were compared (Fig.. The isolated