Improving GNU compiler collection infrastructure for streamization
Résumé
The GNU Compiler Collection (GCC) needs a strategy to support future multicore architectures, which will probably include heterogeneous accelerator-like designs with explicit management of scratchpad memories. Some have further restrictions; for example, SIMD has limited synchronization capabilities. Some platforms will probably offer hardware support for streaming, transactions, and speculation. The purpose of this paper is to survey and evaluate some automatic and manual techniques for improving support for such targets in GCC. We focus on translation of sequential code for such platforms, i.e., the translation to task graphs and their communication and memory access operations. The paper provides an evaluation of the communication library support on an AMD Phenom™ X4 9550 quad-core processor. We use these experiments to tune the automatic task-partitioning algorithm implemented in GCC. The paper concludes with recommendations for strategic developments of GCC to support a stream programming language and improve the automatic generation of streamized tasks.