, NumPy, package for scientific computing with Python, 2017.
, TensorFlow: LargeScale Machine Learning on Heterogeneous Distributed Systems, 2015.
Unified Form Language: A Domain-specific Language for Weak Formulations of Partial Differential Equations, ACM Trans. Math. Softw, vol.40, issue.9, 2014. ,
Opening Polyhedral Compiler's Black Box, Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO '16, pp.128-138, 2016. ,
More Legal Transformations for Locality, Euro-Par 2004 Parallel Processing, 10th International Euro-Par Conference, 2004. ,
DOI : 10.1007/978-3-540-27866-5_36
URL : https://hal.archives-ouvertes.fr/inria-00001056
Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models, Proc. IEEE, vol.93, pp.276-292, 2005. ,
Theano: a CPU and GPU Math Expression Compiler, Proceedings of the Python for Scientific Computing Conference (SciPy), 2010. ,
A Practical Automatic Polyhedral Program Optimization System, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2008. ,
CHiLL: A framework for composing high-level loop transformations, 2008. ,
TVM: End-to-End Optimization Stack for Deep Learning, 2018. ,
DOI : 10.1145/3149166.3149174
Diderot: A Parallel DSL for Image Analysis and Visualization, Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12), pp.111-120, 2012. ,
Facilitating the Search for Compositions of Program Transformations, Proceedings of the 19th Annual International Conference on Supercomputing (ICS '05), pp.151-160, 2005. ,
URL : https://hal.archives-ouvertes.fr/hal-01257296
A Language for the Compact Representation of Multiple Program Versions, pp.136-151, 2006. ,
URL : https://hal.archives-ouvertes.fr/hal-00141067
Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies, International Journal of Parallel Programming, vol.34, issue.3, pp.261-317, 2006. ,
Harris corner detection on a NUMA manycore, Future Generation Computer Systems, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01689709
Futhark: Purely Functional GPUprogramming with Nested Parallelism and In-place Array Updates, Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.556-571, 2017. ,
Fast Static Condensation for the Helmholtz Equation in a Spectral-Element Discretization, pp.371-380, 2016. ,
The Tensor Algebra Compiler, Proc. ACM Program. Lang. 1, OOPSLA, Article, vol.77, 2017. ,
DOI : 10.1145/3133901
URL : http://dl.acm.org/ft_gateway.cfm?id=3133901&type=pdf
Py: Transformation-based Code Generation for GPUs and CPUs, Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming (ARRAY'14, vol.82, 2014. ,
Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly, ACM Trans. Archit. Code Optim, vol.11, p.57, 2015. ,
Optimizing Cache Access: A Tool for Source-to-Source Transformations and Real-Life Compiler Tests, Euro-Par 2004 Parallel Processing, 2004. ,
, , pp.72-81
SPIRAL: Code Generation for DSP Transforms, Proc. IEEE, vol.93, pp.232-275, 2005. ,
DOI : 10.1109/jproc.2004.840306
URL : http://spiral.ece.cmu.edu:8080/pub-spiral/pubfile/paper_1.pdf
Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13), pp.519-530, 2013. ,
PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes, Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis (SCC '12), 2012. ,
, IEEE Computer Society, pp.1116-1123
Modeling of languages for tensor manipulation, 2018. ,
CFDlang: Highlevel Code Generation for High-order Methods in Fluid Dynamics, Proceedings of the Real World Domain Specific Languages Workshop, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01857925
,
Single Assignment C: Efficient Support for High-level Array Operations in a Functional Setting, J. Funct. Program, vol.13, pp.1005-1059, 2003. ,
Program Generation for Small-scale Linear Algebra Applications, Proceedings of the 2018 International Symposium on Code Generation and Optimization, pp.327-339, 2018. ,
DOI : 10.1145/3179541.3168812
URL : http://arxiv.org/pdf/1805.04775
A basic linear algebra compiler for structured matrices, International Symposium on Code Generation and Optimization (CGO, pp.117-127, 2016. ,
DOI : 10.1145/2854038.2854060
Design of a high-performance GEMM-like Tensor-Tensor Multiplication, 2016. ,
TTC: A Tensor Transposition Compiler for Multiple Architectures, 2016. ,
DOI : 10.1145/2935323.2935328
URL : http://arxiv.org/pdf/1607.01249
Generating Performance Portable Code Using Rewrite Rules: From High-level Functional Expressions to High-performance OpenCL Code, Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, pp.205-217, 2015. ,
DOI : 10.1145/2858949.2784754
URL : http://eprints.gla.ac.uk/146605/7/146605.pdf
Lift: A Functional Data-parallel IR for High-performance GPU Code Generation, Proceedings of the 2017 International Symposium on Code Generation and Optimization (CGO '17), pp.74-85, 2017. ,
DOI : 10.1109/cgo.2017.7863730
URL : http://eprints.gla.ac.uk/146596/1/146596.pdf
More Data Locality for Static Control Programs on NUMA Architectures, Proceedings of the 7th International Workshop on Polyhedral Compilation Techniques (IMPACT '17), 2017. ,
URL : https://hal.archives-ouvertes.fr/hal-01529354
Towards Compositional and Generative Tensor Optimizations, Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, pp.169-175, 2017. ,
DOI : 10.1145/3136040.3136050
URL : https://hal.archives-ouvertes.fr/hal-01666818
NWChem: A comprehensive and scalable opensource solution for large scale molecular simulations, Computer Physics Communications, vol.181, pp.1477-1489, 2010. ,
DOI : 10.1016/j.cpc.2010.04.018
URL : https://zenodo.org/record/1258869/files/article.pdf
Automatic Correction of Loop Transformations, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), pp.292-304, 2007. ,
DOI : 10.1109/pact.2007.4336220
URL : https://hal.archives-ouvertes.fr/hal-01257283
Tensor Comprehensions: FrameworkAgnostic High-Performance Machine Learning Abstractions, 2018. ,
POET: Parameterized Optimizations for Empirical Tuning, IEEE International Parallel and Distributed Processing Symposium. 1-8, 2007. ,
DOI : 10.1109/ipdps.2007.370637
URL : http://vuduc.org/pubs/yi2007-poet.pdf