I am Diego Antognini, a research scientist at IBM Research AI and collaborating with the MIT-IBM Watson AI Lab. I have 7 years of research experience in natural language processing, machine learning, and single- and multi-objective recommendation systems. My research is twofold. First, I am developing new methods to align and personalize large language models (LLM) according to user preferences, automatically augment the prompts of LLM to steer their behavior, and adapt retrieval-augmented LLM for extractive and generative question-answering systems on scientific documents. Second, I am interested in efficient NLP for resource‐constrained training and inference settings. I build models with a model size in the order of a few megabytes and a latency of a couple of milliseconds with similar performance and higher throughput than large models. I am also experienced in interpretable models that generate personalized and actionable textual explanations. You can have a quick overview of my background by downloading my résumé.

Additionally, I am a lecturer and module head at the Lucerne University of Applied Sciences (HSLU) where I teach deep learning for NLP at the M.Sc. level. I have supervised 70+ B.Sc. and M.Sc. projects and assessed 60+ student projects. Finally, I offer consulting services in natural language processing, machine learning, and recommender systems.

I am on the job market. I am looking for exciting opportunities in research! If there is an opening in your organization for which I may be a suitable match, please do get in touch.

I hold a Ph.D. degree in Computer Science from the Swiss Federal Institute of Technology in Lausanne (EPFL), where I conducted research in the Artificial Intelligence Laboratory (LIA) under the supervision of Professor Boi Faltings. My doctoral thesis is titled "Textual Explanations and Critiques in Recommendation Systems" (available here). During my Ph.D., I developed models to infer high-quality explanations from text documents in a scalable and data-driven manner through selective rationalization. Moreover, I designed models to make textual explanations actionable (referred to as critiquing) and explored two important applications in natural language processing and conversational recommendation systems. I also worked on multi-objective recommendation and multi-document summarization.

From time to time, I give talks such as my current work on efficient machine learning at MIT-IBM Watson AI Lab, NLP Meetup in Zürich where I presented one of my past work, or when I have participated in challenges with students and we won a $10k prize at the IARPA Geopolitical Forecasting Challenge 2018 (press coverage: EPFL News and 24 Heures).

On this website, I present some publications and patents I have been working on and some (prior to Ph.D.) of the most exciting projects. If you have any questions, feel free to contact me.


Selected Publications

For the full list, you can consult my Google scholar profile.

23) MC Layer Normalization for Calibrated Uncertainty in Deep Learning
Thomas Frick, Diego Antognini, Ioana Giurgiu, Benjamin Grewe, Cristiano Malossi, Rong Zhu, Mattia Rigotti
2023, CoRR

TL;DR: A drop-in replacement for Layer Normalization to endow neural networks with calibrated prediction uncertainty.
22) pNLP-Mixer: an Efficient all-MLP Architecture for Language Paper or Paper
Francesco Fusco, Damian Pascual, Peter Staar, Diego Antognini
2023, ACL

TL;DR: An embedding-free MLP-Mixer model for on-device NLP using a projection layer that relies on MinHash and counting bloom filters. Our model occupies merely one megabyte and achieves 99% of the performance of mBERT.
21) Extracting Text Representations for Terms and Phrases in Technical Domains Paper or Paper
Francesco Fusco*, Diego Antognini*
2023, ACL

TL;DR: Meaningful word embeddings can be achieved using character-based models that are 5x smaller and 10x faster than BERT-based counterparts and do not suffer from out-of-distribution problems.
20) Assistive Recipe Editing through Critiquing Paper or Paper
Diego Antognini, Shuyang Li, Boi Faltings, Julian McAuley
2023, EACL

TL;DR: A framework for generating recipes and enabling users to edit them using critiques in an iterative manner. The system coherently rewrites recipes to satisfy users’ feedback.
19) Unsupervised Term Extraction for Highly Technical Domains Paper or Paper
Francesco Fusco, Peter Staar, Diego Antognini
2022, EMNLP

TL;DR: A fully unsupervised method for term extraction that generalizes across domains. Our setup improves predictive performance and decreases inference latency on both CPUs and GPUs.
18) Active Learning for Imbalanced Civil Infrastructure Data Paper or Paper
Thomas Frick, Diego Antognini, Mattia Rigotti, Ioana Giurgiu, Benjamin Grewe, Cristiano Malossi
2022, ECCV Workshop on Computer Vision for Civil and Infrastructure Engineering (CVCIE)

TL;DR: A method capable of operating on datasets suffering from heavy class imbalance, achieved by replacing the traditional active learning acquisition function with an auxiliary binary discriminator.
17) Textual Explanations and Critiques in Recommendation Systems Paper or Paper
Diego Antognini
2022, EPFL Ph.D. thesis

TL;DR: This dissertation focuses on two fundamental challenges. The first involves generating explanations: inferring high-quality explanations from text documents in a scalable and data-driven manner. The second challenge consists of making explanations actionable, which we refer to as critiquing. This dissertation examines two important applications in natural language processing and recommendation tasks.
16) Positive & Negative Critiquing for VAE-based Recommenders Paper or Paper
Diego Antognini, Boi Faltings
2022, CoRR

TL;DR: Fast negative and positive critiquing generalized for variational autoencoders, resulting in up to a 15% higher success rate compared to state-of-the-art models. The key lies in modeling positive and negative critiques as different modalities and employing a multi-modal VAE with weak supervision.
15) Interlock-Free Multi-Aspect Rationalization for Text Classification Paper or Paper
Shuangqi Li, Diego Antognini, Boi Faltings

TL;DR: Addressing the interlocking dynamics of multi-aspect rationalization, utilizing a novel self-supervised contrastive loss and multi-stage training to generate more semantically diverse rationales.
14) Interacting with Explanations through Critiquing (T-RECS) Paper
Diego Antognini, Claudiu Musat, Boi Faltings
2021, IJCAI

TL;DR: How to extract explanations significantly preferred by humans over those produced by state-of-the-art models and how to make them actionable; enabling users to interact with them iteratively for improving the recommendation.
13) Fast Multi-Step Critiquing for VAE-based Recommender Systems (M&Ms-VAE) Paper or Paper Video
Diego Antognini, Boi Faltings
2021, RecSys

TL;DR: Fast multi-step critiquing generalized for variational autoencoders, resulting in speeds up to 26x faster and a success rate 20% higher compared to state-of-the-art models. The key lies in modeling the problem using multi-modal VAE and weak supervision.
12) Multi-Step Critiquing User Interface for Recommender Systems Paper or Paper Video
Diana, Petrescu*, Diego Antognini*, Boi Faltings
2021, RecSys Demo

TL;DR: We propose and demonstrate a new way of interacting with recommender systems to help users make decisions and find their ideal items.
11) Rationalization through Concepts (ConRAT) Paper or Paper Video
Diego Antognini, Boi Faltings
2021, ACL Findings

TL;DR: Generalization of MTM: how to extract interpretable multi-faceted concepts (i.e., rationales) for single-task classification problems. It generate concepts that align with human rationalization, and outperforms state-of-the-art methods trained on each aspect label independently.
10) Multi-Dimensional Explanation of Target Variables from Documents (MTM) Paper or Paper Video
Diego Antognini, Claudiu Musat, Boi Faltings
2021, AAAI

TL;DR: One model to extract interpretable, meaningful, and coherent multi-faceted rationales for multi-task text classification problems, and perform better than individual rationalization models.
9) Addressing Fairness in Classification with a Model-Agnostic Multi-Objective Algorithm Paper or Paper Video
Kirtan Padh, Diego Antognini, Emma L. Glaude, Boi Faltings, Claudiu Musat
2021, UAI

TL;DR: A model-agnostic multi-objective architecture that optimizes multiple fairness notions and sensitive attributes using a novel differentiable relaxation that approximates fairness notions through the hyperbolic tangent function.
8) Multi-Gradient Descent for Multi-Objective Recommender Systems Paper or Paper
Nikola Milojkovic, Diego Antognini, Giancarlo Bergamin, Boi Faltings, Claudiu Musat
2020, AAAI Workshop on Interactive and Conversational Recommendation Systems (WICRS)

TL;DR: An efficient stochastic multi-gradient descent approach for multi-objective recommender systems.
7) HotelRec: a Novel Very Large-Scale Hotel Recommendation Dataset Paper or Paper
Diego Antognini, Boi Faltings
2020, LREC

TL;DR: A new dataset with 50 million hotel reviews with meta-attributes, user information, and multi-aspect ratings.
6) Recommending Burgers based on Pizza Preferences: Addressing Data Sparsity with a Product of Experts Paper or Paper
Martin Milenkoski, Diego Antognini, Boi Faltings
2021, Recsys Workshop of Cross-Market Recommendation

TL;DR: We address data sparsity and generate recommendations in domains where there is limited knowledge about the user preferences.
5) Modeling Online Behavior in Recommender Systems: The Importance of Temporal Context Paper or Paper
Milena Filipovic*, Blagoj Mitrevski*, Diego Antognini, Emma L. Glaude, Boi Faltings, Claudiu Musat
2021, RecSys Workshop on Perspectives on the Evaluation of Recommender Systems

TL;DR: Omitting temporal context while evaluating recommender systems leads to false confidence. We propose an evaluation protocol and a model-agnostic training procedure to incorporate temporal context.
4) Momentum-based Gradient Methods in Multi-objective Recommender Systems Paper or Paper
Blagoj Mitrevski*, Milena Filipovic*, Diego Antognini, Emma L. Glaude, Boi Faltings, Claudiu Musat
2021, RecSys Workshop on Multi-Objective Recommender Systems

TL;DR: A coordinated multi-objective optimization method in which each objective is optimized using an algorithm similar to the Adam algorithm.
3) GameWikiSum: a Novel Large Multi-Document Summarization Dataset Paper or Paper
Diego Antognini, Boi Faltings
2020, LREC

TL;DR: A non-news domain-specific dataset for multi-document summarization that is 100 times larger than commonly used datasets.
2) Learning to Create Sentence Semantic Relation Graphs for Multi-Document Summarization Paper or Paper
Diego Antognini, Boi Faltings
2019, EMNLP Workshop on New Frontiers in Summarization

TL;DR: How to leverage universal and domain-sepcific sentence embeddings using a graph structure for multi-document summarization.
1) Dataset Construction via Attention for Aspect Term Extraction with Distant Supervision Paper or Paper
Athanasios Giannakopoulos*, Diego Antognini*, Claudiu Musat, Andreea Hossmann and Michael Baeriswyl
2017, ICDM Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction (SENTIRE)

TL;DR: How to utilize large corpora for improved aspect term extraction using distant supervision.

Filed Patents

1-5) To be updated shortly.

Projects (prior to Ph.D.)

From Relation Extraction to Knowledge Graphs - M.Sc. thesis

My master thesis at Iprova. A system for extracting concepts from large corpora and built interactive knowledge graphs to provide invention developers with new insights. View more

NeoBrain - B.Sc. thesis

A research project focused on optimizing neuronal activity maps treatment using massively parallel technologies. View more


A scalable, decentralized system that aggregates secondary storage devices in a cluster with the aim of supporting parallel scans of data stored across them. View more

Optimized flocking algorithm for e-pucks

I implemented, tested, analyzed and optimized a flocking algorithm for e-pucks. The objective was for the robots to avoid obstacles within the arena while maintaining their collective formation. Work in a multidisciplinary team. View more


Realization of a complete Texas Hold'em Poker game with artificial intelligence. View more

Starfighter 4K

Shoot 'em up game utilizing motion recognition with Kinect and Wiimotes for spaceship movement, inclination, and shooting. View more


Several mini-projects are available for learning about GPGPU technologies, primarily CUDA. View more

Image classification

Classifier that recognizes the object present in an image using advanced models. The objects could be classified as a horse, airplane, car, or something else. View more

Social Recommendation System

Recommender systems for events based on user data and Facebook profile. View more

Facial recognition among profiles

Detect whether a person is wearing sunglasses using a collection of profile pictures of different individuals. Each person has pictures taken from different head angles, displaying different emotions, and with or without sunglasses. View more

Pattern classification and machine learning project 1

Project on regression and classification using linear models. One dataset is provided for each task without any accompanying information. View more

Recommender System challenge

Third task of the challenge of European Semantic Web Conference on a Top-N recommendation of books (ESWC-14 Challenge). Github Report


A movie directory with heavy database background using real data from IMDb. View more


Planetarium software displays the current view of the sky at the present location. View more