Alexander Heinecke

Cited by

	All	Since 2019
Citations	4256	3001
h-index	31	25
i10-index	83	65

760

380

190

570

20102011201220132014201520162017201820192020202120222023202415 13 46 130 120 194 206 198 257 326 391 506 583 742 447

Public access

View all

10 articles

3 articles

available

not available

Based on funding mandates

Alexander Heinecke

Senior Principal Engineer at Intel Labs

Verified email at intel.com - Homepage

HPC and Parallel Computing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
A study of BFLOAT16 for deep learning training D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ... arXiv preprint arXiv:1905.12322, 2019	322	2019
The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science A Marek, V Blum, R Johanni, V Havu, B Lang, T Auckenthaler, A Heinecke, ... Journal of Physics: Condensed Matter 26 (21), 213201, 2014	313	2014
Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor A Heinecke, K Vaidyanathan, M Smelyanskiy, A Kobotov, R Dubtsov, ... 2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013	216	2013
LIBXSMM: accelerating small matrix multiplications by runtime code generation A Heinecke, G Henry, M Hutchinson, H Pabst SC'16: Proceedings of the International Conference for High Performance …, 2016	204	2016
Mixed precision training of convolutional neural networks using integer operations D Das, N Mellempudi, D Mudigere, D Kalamkar, S Avancha, K Banerjee, ... arXiv preprint arXiv:1802.00930, 2018	191	2018
Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers A Heinecke, A Breuer, S Rettenberger, M Bader, AA Gabriel, C Pelties, ... SC'14: Proceedings of the International Conference for High Performance …, 2014	168	2014
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems C Niethammer, S Becker, M Bernreuther, M Buchholz, W Eckhardt, ... Journal of chemical theory and computation 10 (10), 4455-4464, 2014	162	2014
Anatomy of high-performance deep learning convolutions on simd architectures E Georganas, S Avancha, K Banerjee, D Kalamkar, G Henry, H Pabst, ... SC18: International Conference for High Performance Computing, Networking …, 2018	128	2018
Distgnn: Scalable distributed training for large-scale graph neural networks V Md, S Misra, G Ma, R Mohanty, E Georganas, A Heinecke, D Kalamkar, ... Proceedings of the International Conference for High Performance Computing …, 2021	110	2021
Fp8 formats for deep learning P Micikevicius, D Stosic, N Burgess, M Cornea, P Dubey, R Grisenthwaite, ... arXiv preprint arXiv:2209.05433, 2022	104	2022
591 TFLOPS multi-trillion particles simulation on SuperMUC W Eckhardt, A Heinecke, R Bader, M Brehm, N Hammer, H Huber, ... Supercomputing: 28th International Supercomputing Conference, ISC 2013 …, 2013	102	2013
From gpgpu to many-core: Nvidia fermi and intel many integrated core architecture A Heinecke, M Klemm, HJ Bungartz Computing in Science & Engineering 14 (2), 78-83, 2012	89	2012
Sustained petascale performance of seismic simulations with SeisSol on SuperMUC A Breuer, A Heinecke, S Rettenberger, M Bader, AA Gabriel, C Pelties Supercomputing: 29th International Conference, ISC 2014, Leipzig, Germany …, 2014	87	2014
Leveraging the bfloat16 artificial intelligence datatype for higher-precision computations G Henry, PTP Tang, A Heinecke 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), 69-76, 2019	72	2019
Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices J Park, M Smelyanskiy, K Vaidyanathan, A Heinecke, DD Kalamkar, X Liu, ... SC'14: Proceedings of the International Conference for High Performance …, 2014	67	2014
Performance optimizations for scalable implicit RANS calculations with SU2 TD Economon, D Mudigere, G Bansal, A Heinecke, F Palacios, J Park, ... Computers & Fluids 129, 146-158, 2016	57	2016
Methods and apparatus to detect anomalies of a monitored system M Agerstam, B Sadeghi, J Martin, J Ota, J Gottschlich, M Carranza, ... US Patent 10,802,942, 2020	55	2020
Computer processor for higher precision computations using a mixed-precision decomposition of operations G Henry, A Heinecke US Patent 10,853,067, 2020	51	2020
Petascale local time stepping for the ADER-DG finite element method A Breuer, A Heinecke, M Bader 2016 IEEE international parallel and distributed processing symposium (IPDPS …, 2016	51	2016
Optimized compute hardware for machine learning operations D Das, R Gramunt, M Smelyanskiy, J Corbal, D Mudigere, NK Mellempudi, ... US Patent 10,776,699, 2020	47	2020

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by