H. Sutter, The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software, Dr. Dobb's Journal, vol.30, issue.3, pp.202-210, 2005.

R. H. Netzer and B. P. Miller, What are race conditions?: Some issues and formalizations, ACM Letters on Programming Languages and Systems, vol.1, issue.1, pp.74-88, 1992.
DOI : 10.1145/130616.130623

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.145.1099

W. J. Bolosky and M. L. Scott, False Sharing and Its Effect on Shared Memory Performance, USENIX Systems on USENIX Experiences with Distributed and Multiprocessor Systems, pp.3-3, 1993.

M. Wolfe, Techniques for Improving the Inherent Parallelism in Programs , ser, 1978.

C. Dave, H. Bae, S. Min, S. Lee, R. Eigenmann et al., Cetus: A Source-to-Source Compiler Infrastructure for Multicores, Computer, vol.42, issue.12, pp.36-42, 2009.
DOI : 10.1109/MC.2009.385

U. Bondhugula, PLUTO -An automatic parallelizer and locality optimizer for affine loop nests
DOI : 10.1145/1379022.1375595

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.151.5126

U. Bondhugula, Compiling Affine Loop Nests for Distributed-memory Parallel Architectures Storage and Analysis, ser. SC '13, Proceedings of the International Conference on High Performance Computing, Networking, 2013.
DOI : 10.1145/2503210.2503289

P. Labs, ]. S. Ppcg-polyhedral-parallel-code-generator, J. C. Verdoolaege, A. Juega, J. I. Cohen et al., Available: https://www.openhub.net/p/ppcg [10 Polyhedral Parallel Code Generation for CUDA, ACM Trans. Archit. Code Optim, vol.9, issue.4, pp.541-5423, 2013.

A. Xfor, Programming Structure to Ease the Formulation of Efficient Loop Optimizations & A Polyhedral Language

I. Fassi and P. Clauss, XFOR: Filling the Gap between Automatic Loop Optimization and Peak Performance, 2015 14th International Symposium on Parallel and Distributed Computing, 2015.
DOI : 10.1109/ISPDC.2015.19

URL : https://hal.archives-ouvertes.fr/hal-01155144

R. Habel, High Performance Programming for Hybrid Architectures, Theses, Ecole Nationale Supérieure des Mines de Paris, 2014.
URL : https://hal.archives-ouvertes.fr/tel-01101782

L. Team, clang: a C language family frontend for LLVM

Y. Kwok and I. Ahmad, Benchmarking and Comparison of the Task Graph Scheduling Algorithms, Journal of Parallel and Distributed Computing, vol.59, issue.3, pp.381-422, 1999.
DOI : 10.1006/jpdc.1999.1578

]. S. Jin, G. Schiavone, and D. Turgut, A performance study of multiprocessor task scheduling algorithms, The Journal of Supercomputing, vol.5, issue.1, pp.77-97, 2008.
DOI : 10.1007/s11227-007-0139-z

A. Openmp and . Openmp, Available: http://openmp

?. Cilk and T. Plus, Available: https://www.cilkplus.org/ [22] Oak Ridge National Laboratory. PVM, Parallel Virtual Machine

V. S. Sunderam, PVM: A framework for parallel distributed computing, Concurrency: Practice and Experience, vol.4, issue.4, pp.315-339, 1990.
DOI : 10.1002/cpe.4330020404

U. Bondhugula, A. Acharya, and A. Cohen, The Pluto+ Algorithm, ACM Transactions on Programming Languages and Systems, vol.38, issue.3, pp.1-1232, 2016.
DOI : 10.1145/2896389

URL : https://hal.archives-ouvertes.fr/hal-01425546

C. Yang and K. Lai, A directive-based MPI code generator for Linux PC clusters, The Journal of Supercomputing, vol.9, issue.4, pp.177-207, 2009.
DOI : 10.1007/s11227-008-0258-1

M. Y. Wu and D. D. Gajski, Hypertool: a programming aid for message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.3, pp.330-343, 1990.
DOI : 10.1109/71.80160

T. Yang and A. Gerasoulis, PYRROS: Static Task Scheduling and Code Generation for Message Passing Multiprocessors, Proceedings of the 6th International Conference on Supercomputing, ser. ICS '92, pp.428-437, 1992.

P. Bellens, J. M. Perez, R. M. Badia, and J. Labarta, CellSs: a Programming Model for the Cell BE Architecture, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.17

D. Millot, A. Muller, C. Parrot, and F. Silber-chaussumier, From OpenMP to MPI: first experiments of the STEP source-to-source transformation tool, Parallel Computing: From Multicores and GPU's to Petascale, Proceedings of the conference ParCo, pp.669-676, 2009.
URL : https://hal.archives-ouvertes.fr/hal-01368936

M. Amini, C. Ancourt, F. Coelho, B. Creusillet, S. Guelton et al., PIPS Is not (just) Polyhedral Software Adding GPU Code Generation in PIPS, First International Workshop on Polyhedral Compilation Techniques (IMPACT 2011) in conjonction with CGO 2011, p.6, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00744312

D. Khaldi, P. Jouvelot, and C. Ancourt, Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems, Parallel Computing, vol.41, pp.66-89, 2015.
DOI : 10.1016/j.parco.2014.11.004

URL : https://hal.archives-ouvertes.fr/hal-01097328

B. Creusillet and F. Irigoin, Interprocedural Array Region Analyses, International Journal of Parallel Programming, vol.2, issue.3, pp.513-546, 1996.
DOI : 10.1007/BF03356758

URL : https://hal.archives-ouvertes.fr/hal-00752611

C. Harris and M. Stephens, A Combined Corner and Edge Detector, Procedings of the Alvey Vision Conference 1988, pp.147-151, 1988.
DOI : 10.5244/C.2.23

M. Klemm and C. Terboven, Full Throttle: OpenMP 4.0, " The Parallel Universe Magazine, pp.6-16, 2013.

M. Tillenius, E. Larsson, R. M. Badia, and X. Martorell, Resource-Aware Task Scheduling, ACM Transactions on Embedded Computing Systems, vol.14, issue.1, pp.1-525, 2015.
DOI : 10.1145/2638554

URL : http://hdl.handle.net/2117/28025

C. Ancourt and T. V. Nguyen, Array resizing for scientific code debugging, maintenance and reuse, Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering , PASTE '01, pp.32-37, 2001.
DOI : 10.1145/379605.379656

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.76.9157

M. Open, Available: https://www.open-mpi

. Polybench, Available: https://sourceforge