Skip to Main content Skip to Navigation
Conference papers

Scalable NUMA-Aware Wilson-Dirac on Supercomputers

Abstract : We revisit the Wilson-Dirac operator, also referred as Dslash, on NUMA manycore vector machines and thereby seek an efficient supercomputing implementation. Quantum Chro-moDynamics (QCD) is the theory of the strong nuclear force and its discrete formalism is the so-called Lattice Quantum ChromoDynamics (LQCD). Wilson-Dirac is the major computing kernel in LQCD, where a special attention is paid to large scale simulations. The corresponding computing demand is tremendous at various levels from storage to floating-point operations, thus the crucial need for powerful supercomputers. Designing efficient LQCD codes on modern (mostly hybrid) supercomputers requires to efficiently exploit all available levels of parallelism including accelerators. Since Wilson-Dirac is a coarse-grain stencil computation performed on a huge volume of data, any performance and scalability related investigation should skillfully address memory accesses and interprocessor communication overheads. In order to lower the latter, explicit shared memory implementations should be considered at the level of a compute node, since this will lead to a less complex data communication graph and thus (at least intuitively) reduce the overall communication latency. We focus on this aspect and propose a novel efficient NUMA-aware scheduling, together with a combination of the major HPC strategies for large-scale LQCD. We reach nearly optimal performances on a single core and a significant scalability improvement on several NUMA nodes. Then, using a classical domain decomposition approach, we extend our scheduling to a large cluster of many-core nodes, thus illustrating the global efficiency of our hybrid implementation.
Document type :
Conference papers
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Claire Medrala <>
Submitted on : Tuesday, May 30, 2017 - 2:30:52 PM
Last modification on : Thursday, September 24, 2020 - 4:36:02 PM
Long-term archiving on: : Wednesday, September 6, 2017 - 1:53:24 PM


Files produced by the author(s)


  • HAL Id : hal-01529268, version 1


Claude Tadonki. Scalable NUMA-Aware Wilson-Dirac on Supercomputers. The 2017 International Conference on High Performance Computing & Simulation (HPCS 2017), Jul 2017, Genoa, Italy. pp.315-324. ⟨hal-01529268⟩



Record views


Files downloads