Follow
Mayank Mishra
Mayank Mishra
MIT-IBM Watson Lab
Verified email at ibm.com - Homepage
Title
Cited by
Cited by
Year
Bloom: A 176b-parameter open-access multilingual language model
T Le Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné, ...
14462023
StarCoder: may the source be with you!
R Li, LB Allal, Y Zi, N Muennighoff, D Kocetkov, C Mou, M Marone, C Akiki, ...
arXiv preprint arXiv:2305.06161, 2023
5402023
SantaCoder: don't reach for the stars!
LB Allal, R Li, D Kocetkov, C Mou, C Akiki, CM Ferrandis, N Muennighoff, ...
arXiv preprint arXiv:2301.03988, 2023
1652023
StarCoder 2 and The Stack v2: The Next Generation
A Lozhkov, R Li, LB Allal, F Cassano, J Lamy-Poirier, N Tazi, A Tang, ...
arXiv preprint arXiv:2402.19173, 2024
792024
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
M Mishra, M Stallone, G Zhang, Y Shen, A Prasad, AM Soria, M Merler, ...
arXiv preprint arXiv:2405.04324, 2024
162024
Adversarial approximate inference for speech to electroglottograph conversion
AP Prathosh, V Srivastava, M Mishra
IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (12 …, 2019
72019
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
B Pan, Y Shen, H Liu, M Mishra, G Zhang, A Oliva, C Raffel, R Panda
arXiv preprint arXiv:2404.05567, 2024
62024
Variational Inference with Latent Space Quantization for Adversarial Resilience
V Kyatham, M Mishra, TK Yadav, D Mishra, AP Prathosh
arXiv preprint arXiv:1903.09940, 2019
52019
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
W Brandon, M Mishra, A Nrusimha, R Panda, JR Kelly
arXiv preprint arXiv:2405.12981, 2024
32024
Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization
A Nrusimha, M Mishra, N Wang, D Alistarh, R Panda, Y Kim
arXiv preprint arXiv:2404.03605, 2024
32024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the US Executive Order
T Nakamura, M Mishra, S Tedeschi, Y Chai, JT Stillerman, F Friedrich, ...
arXiv preprint arXiv:2404.00399, 2024
32024
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
G Pandey, Y Nandwani, T Naseem, M Mishra, G Xu, D Raghu, S Joshi, ...
arXiv preprint arXiv:2402.02479, 2024
22024
Variational Learning for Unsupervised Knowledge Grounded Dialogs
M Mishra, D Madan, G Pandey, D Contractor
31st International Joint Conference on Artificial Intelligence (IJCAI 2022), 2021
22021
Enhancing Training Efficiency Using Packing with Flash Attention
A Kundu, RD Lee, L Wynter, RK Ganti, M Mishra
arXiv preprint arXiv:2407.09105, 2024
12024
The infrastructure powering IBM's Gen AI model development
T Gershon, S Seelam, B Belgodere, M Bonilla, L Hoang, D Barnett, ...
arXiv preprint arXiv:2407.05467, 2024
12024
Prompting with Pseudo-Code Instructions
M Mishra, P Kumar, R Bhat, R Murthy V, D Contractor, S Tamilselvam
EMNLP 2023, 2023
12023
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler
Y Shen, M Stallone, M Mishra, G Zhang, S Tan, A Prasad, AM Soria, ...
arXiv preprint arXiv:2408.13359, 2024
2024
Scaling Granite Code Models to 128K Context
M Stallone, V Saxena, L Karlinsky, B McGinn, T Bula, M Mishra, AM Soria, ...
arXiv preprint arXiv:2407.13739, 2024
2024
Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model
M Mishra
https://huggingface.co/blog/mayank-mishra/aurora, 2024
2024
Saving Memory Using Padding-Free Transformer Layers during Finetuning
M Mishra
https://huggingface.co/blog/mayank-mishra/padding-free-transformer, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–20