, NumPy, package for scientific computing with Python, 2017.

M. Abadi and A. Agarwal, TensorFlow: LargeScale Machine Learning on Heterogeneous Distributed Systems, 2015.

M. S. Alnaes, A. Logg, K. B. Olgaard, M. E. Rognes, and G. N. Wells, Unified Form Language: A Domain-specific Language for Weak Formulations of Partial Differential Equations, ACM Trans. Math. Softw, vol.40, issue.9, 2014.

L. Bagnères, O. Zinenko, S. Huot, and C. Bastoul, Opening Polyhedral Compiler's Black Box, Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO '16, pp.128-138, 2016.

C. Bastoul and P. Feautrier, More Legal Transformations for Locality, Euro-Par 2004 Parallel Processing, 10th International Euro-Par Conference, 2004.
DOI : 10.1007/978-3-540-27866-5_36

URL : https://hal.archives-ouvertes.fr/inria-00001056

G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella et al., Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models, Proc. IEEE, vol.93, pp.276-292, 2005.

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu et al., Theano: a CPU and GPU Math Expression Compiler, Proceedings of the Python for Scientific Computing Conference (SciPy), 2010.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A Practical Automatic Polyhedral Program Optimization System, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2008.

C. Chen, J. Chame, and M. Hall, CHiLL: A framework for composing high-level loop transformations, 2008.

T. Chen, T. Moreau, Z. Jiang, H. Shen, E. Q. Yan et al., TVM: End-to-End Optimization Stack for Deep Learning, 2018.
DOI : 10.1145/3149166.3149174

C. Chiw, G. Kindlmann, J. Reppy, L. Samuels, and N. Seltzer, Diderot: A Parallel DSL for Image Analysis and Visualization, Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12), pp.111-120, 2012.

A. Cohen, M. Sigler, S. Girbal, and O. Temam, Facilitating the Search for Compositions of Program Transformations, Proceedings of the 19th Annual International Conference on Supercomputing (ICS '05), pp.151-160, 2005.
URL : https://hal.archives-ouvertes.fr/hal-01257296

S. Donadio, J. Brodman, T. Roeder, K. Yotov, D. Barthou et al., A Language for the Compact Representation of Multiple Program Versions, pp.136-151, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00141067

N. Sylvain-girbal, C. Vasilache, A. Bastoul, D. Cohen, M. Parello et al., Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies, International Journal of Parallel Programming, vol.34, issue.3, pp.261-317, 2006.

O. Haggui, C. Tadonki, L. Lacassagne, F. Sayadi, and B. Ouni, Harris corner detection on a NUMA manycore, Future Generation Computer Systems, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01689709

T. Henriksen, G. W. Niels, M. Serup, F. Elsman, C. E. Henglein et al., Futhark: Purely Functional GPUprogramming with Nested Parallelism and In-place Array Updates, Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.556-571, 2017.

J. Immo-huismann, J. Stiller, and . Fröhlich, Fast Static Condensation for the Helmholtz Equation in a Spectral-Element Discretization, pp.371-380, 2016.

F. Kjolstad, S. Kamil, S. Chou, D. Lugato, and S. Amarasinghe, The Tensor Algebra Compiler, Proc. ACM Program. Lang. 1, OOPSLA, Article, vol.77, 2017.
DOI : 10.1145/3133901

URL : http://dl.acm.org/ft_gateway.cfm?id=3133901&type=pdf

A. Klöckner, Py: Transformation-based Code Generation for GPUs and CPUs, Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY'14, vol.82, 2014.

F. Luporini, A. L. Varbanescu, F. Rathgeber, G. Bercea, J. Ramanujam et al., Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly, ACM Trans. Archit. Code Optim, vol.11, p.57, 2015.

R. Müller-pfefferkorn, W. E. Nagel, and B. Trenkler, Optimizing Cache Access: A Tool for Source-to-Source Transformations and Real-Life Compiler Tests, Euro-Par 2004 Parallel Processing, 2004.

H. Springer-berlin, , pp.72-81

M. Puschel, J. M. Moura, J. R. Johnson, D. Padua, M. M. Veloso et al., SPIRAL: Code Generation for DSP Transforms, Proc. IEEE, vol.93, pp.232-275, 2005.
DOI : 10.1109/jproc.2004.840306

URL : http://spiral.ece.cmu.edu:8080/pub-spiral/pubfile/paper_1.pdf

J. Ragan-kelley, C. Barnes, A. Adams, S. Paris, F. Durand et al., Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13), pp.519-530, 2013.

F. Rathgeber, G. R. Markall, L. Mitchell, N. Loriant, D. A. Ham et al., PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes, Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SCC '12), 2012.

, IEEE Computer Society, pp.1116-1123

N. A. Rink, Modeling of languages for tensor manipulation, 2018.

N. A. Rink, I. Huismann, A. Susungi, J. Castrillon, J. Stiller et al., CFDlang: Highlevel Code Generation for High-order Methods in Fluid Dynamics, Proceedings of the Real World Domain Specific Languages Workshop, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01857925


S. Scholz, Single Assignment C: Efficient Support for High-level Array Operations in a Functional Setting, J. Funct. Program, vol.13, pp.1005-1059, 2003.

D. G. Spampinato, D. Fabregat-traver, P. Bientinesi, and M. Püschel, Program Generation for Small-scale Linear Algebra Applications, Proceedings of the 2018 International Symposium on Code Generation and Optimization, pp.327-339, 2018.
DOI : 10.1145/3179541.3168812

URL : http://arxiv.org/pdf/1805.04775

D. G. Spampinato and M. Püschel, A basic linear algebra compiler for structured matrices, International Symposium on Code Generation and Optimization (CGO, pp.117-127, 2016.
DOI : 10.1145/2854038.2854060

P. Springer and P. Bientinesi, Design of a high-performance GEMM-like Tensor-Tensor Multiplication, 2016.

P. Springer, A. Sankaran, and P. Bientinesi, TTC: A Tensor Transposition Compiler for Multiple Architectures, 2016.
DOI : 10.1145/2935323.2935328

URL : http://arxiv.org/pdf/1607.01249

M. Steuwer, C. Fensch, S. Lindley, and C. Dubach, Generating Performance Portable Code Using Rewrite Rules: From High-level Functional Expressions to High-performance OpenCL Code, Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, pp.205-217, 2015.
DOI : 10.1145/2858949.2784754

URL : http://eprints.gla.ac.uk/146605/7/146605.pdf

M. Steuwer, T. Remmelg, and C. Dubach, Lift: A Functional Data-parallel IR for High-performance GPU Code Generation, Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO '17), pp.74-85, 2017.
DOI : 10.1109/cgo.2017.7863730

URL : http://eprints.gla.ac.uk/146596/1/146596.pdf

A. Susungi, A. Cohen, and C. Tadonki, More Data Locality for Static Control Programs on NUMA Architectures, Proceedings of the 7th International Workshop on Polyhedral Compilation Techniques (IMPACT '17), 2017.
URL : https://hal.archives-ouvertes.fr/hal-01529354

A. Susungi, N. A. Rink, J. Castrillón, I. Huismann, A. Cohen et al., Towards Compositional and Generative Tensor Optimizations, Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, pp.169-175, 2017.
DOI : 10.1145/3136040.3136050

URL : https://hal.archives-ouvertes.fr/hal-01666818

M. Valiev, E. J. Bylaska, N. Govind, K. Kowalski, T. P. Straatsma et al., NWChem: A comprehensive and scalable opensource solution for large scale molecular simulations, Computer Physics Communications, vol.181, pp.1477-1489, 2010.
DOI : 10.1016/j.cpc.2010.04.018

URL : https://zenodo.org/record/1258869/files/article.pdf

N. Vasilache, A. Cohen, and L. Pouchet, Automatic Correction of Loop Transformations, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp.292-304, 2007.
DOI : 10.1109/pact.2007.4336220

URL : https://hal.archives-ouvertes.fr/hal-01257283

N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. Devito et al., Tensor Comprehensions: FrameworkAgnostic High-Performance Machine Learning Abstractions, 2018.

Q. Yi, K. Seymour, H. You, R. Vuduc, and D. Quinlan, POET: Parameterized Optimizations for Empirical Tuning, IEEE International Parallel and Distributed Processing Symposium. 1-8, 2007.
DOI : 10.1109/ipdps.2007.370637

URL : http://vuduc.org/pubs/yi2007-poet.pdf