Computational Antibody Papers

Filter by tags

All

Filter by published year

All

TitleKey points

2025-06-24
Benchmark for Antibody Binding Affinity Maturation and Design
- binding prediction
- language models
- generative methods
- databases
- Benchmark of machine learning models for antibody-antigen binding affinity.
- A curated dataset of over 150,000 antibody-antigen complexes with associated experimental affinity values is compiled from literature.
- The benchmark compares a wide range of model types: language models, inverse folding models, graph-based, and diffusion-based generative models.
- Inverse folding models that are globally structure-aware perform best.
- General protein models like ESM-IF and ProteinMPNN outperform antibody-specific models such as AntiFold, DiffAb, and dyMEAN.
- Surprisingly, ESM-3 underperforms relative to ESM-IF, despite incorporating structural signals and improving upon earlier ESM models.
2025-06-24
NanoBinder: a machine learning assisted nanobody binding prediction tool using Rosetta energy scores
- binding prediction
- nanobodies
- Introduced a novel machine learning method (NanoBinder) to predict the binding probability of nanobody-antigen structural complexes.
- Positive (binding) complexes were sourced from the SAbDab database, which contains experimentally validated nanobody-antigen interactions.
- Negative (non-binding) complexes were generated by structurally aligning nanobodies from different binding complexes (with RMSD < 2 Å) and recombining them with unrelated antigens to create likely non-binding pairs.
- Extracted Rosetta energy features from each complex and trained several machine learning models, including Random Forests, SVMs, AdaBoost, and Decision Trees, to classify binders vs. non-binders. Random Forests showed the best performance.
- They selected antibodies with known antigen targets (e.g., IL-6) and grafted their CDRs onto nanobody scaffolds using Rosetta-based protocols. The resulting nanobody-antigen complexes were evaluated in silico using NanoBinder, and selected candidates were experimentally validated. The predictions showed good correlation with binding outcomes, particularly for identifying non-binders.
2025-06-05
Learning the language of protein-protein interactions
- language models
- binding prediction
- Novel LLM (MINT) that natively encapsulates protein protein interactions.
- MINT (Multimeric INteraction Transformer) extends the ESM-2 protein language model by incorporating a cross-chain attention mechanism. This allows it to process multiple protein sequences simultaneously while preserving inter-sequence relationships and contextual information critical for modeling protein-protein interactions.
- MINT was trained on a large, curated subset of the STRING database, consisting of 96 million high-quality physical protein-protein interactions and 16.4 million unique protein sequences. The training employed a masked language modeling objective adapted for multimeric inputs.
- MINT was benchmarked on several general protein interaction tasks including binary interaction classification, binding affinity prediction (PDB-Bind), and mutational impact prediction (e.g., SKEMPI and MutationalPPI). It consistently outperformed existing PLMs, achieving state-of-the-art performance on multiple datasets such as a 29% improvement over baselines in SKEMPI.
- MINT outperformed antibody-specific models (e.g., IgBert, IgT5, and AbMap) on the FLAB benchmark and SARS-CoV-2 antibody mutant binding prediction tasks. It showed >10% performance improvement on three FLAB datasets and a 14% gain in low-data settings (0.5% training data) for SARS-CoV-2 binding predictions.
2025-06-05
AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks
- developability
- generative methods
- protein design
- Novel generative modeling framework (AbBFN2) using Bayesian Flow Networks (BFNs) for antibody sequence optimization.
- Trains on sequences from Observed Antibody Space (OAS) combined with genetic and biophysical annotations, leveraging a denoising approach for both conditional and unconditional sequence generation. Targets include optimizing Therapeutic Antibody Profiler (TAP) annotations.
- Computationally validated for germline assignment accuracy, species prediction (humanness), and TAP parameter optimization.
- Combines multiple antibody design objectives into a unified, single-step optimization process, unlike existing software methods which are typically specialized for individual tasks.
2025-06-05
Adapting ProteinMPNN for antibody design without retraining
- protein design
- generative methods
- Novel method to bias ProteinMPNN for antibody design, without modifying model weights.
- Logits from protein-general ProteinMPNN and antibody-specific AbLANG are added and softmaxed. Addition of AbLANG is supposed to push the model into the antibody-acceptable space.
- On in-silico experiments ProteinMPNN+AbLang outperformed ProteinMPNN alone and rivalled antibody-specific AbMPNN.
- Authors designed 96 variants of Trastuzumab CDR-H3 using ProteinMPNN, AbLang and ProteinMPNN+AbLang each. AbLANG and ProteinMPNN produced 1 and 3 successful variants respecitively (both out of 96) whereas their combination produced 36 successful variants.
- None of the variants were better variants than WT Trastuzumab.
2025-06-05
PSBench: a large-scale benchmark for estimating the accuracy of protein complex structural models:
- structure prediction
- PSBench is a large benchmark dataset (>1M models) for training and evaluating model accuracy estimation (EMA) methods for protein complex structures, using data from CASP15 & CASP16.
- Models were generated by AlphaFold2-Multimer and AlphaFold3 under blind prediction conditions and annotated with 10 detailed global, local, and interface quality scores.
- The dataset enables development of advanced EMA methods (e.g. GATE), which showed top performance in blind CASP16 assessments.
2025-05-08
RIOT
- annotation/numbering
- Fast and reliable numbering tool with an inbuilt free germline database, unifying functionalities of tools such as IgBlast, ANARCI etc.
- It can number both amino acid and nucleotide sequences.
- Rather than using statistical methods such as HMMs, MMSeqs-like methodology was used for rapid alignment.
- Alignments are more accurate than existing methods, with speed improvement, running on a CPU.
2025-05-08
AntPack
- annotation/numbering
- Fast, alignment-based antibody numbering tool, significantly outperforming existing software in processing speed.
- Uses a simplified global alignment with a custom scoring matrix, facilitating rapid numbering of millions of sequences efficiently.
- Ensures accuracy comparable to established methods (ANARCI, AbNum) while numbering large-scale antibody datasets.
- Emphasizes interpretability and robustness, providing transparent sequence scoring useful for humanization tasks.
2025-05-06
ANARCII
- annotation/numbering
- New version of ANARCI - using language models.
- Employs a Seq2Seq language model eliminating the need for alignment-based numbering, thus generalizing well to novel sequences.
- Provides numbering that matches existing methods for >99.99% conserved residues and >99.94% CDR regions.
- Improved speed of the original HMM-based ANARCI when GPU is available.
- Can be fine-tuned for rare immunoglobulin domains (e.g., shark VNAR sequences, T-cell receptors), offering customizable antibody numbering workflows.
2025-05-06
AbnNumPro
- annotation/numbering
- Offline toolkit for antibody numbering and CDR delineation (ABRs).
- Provides an offline toolkit integrating five established antibody numbering schemes (Kabat, Chothia, IMGT, Aho, Martin).
- Uses IMGT as the source of Germlines.
- Allows prediction of Complementarity-Determining Regions (CDRs) and Antigen-Binding Regions (ABRs) through Hidden Markov Models (HMMs).
- Addresses data security concerns by enabling offline usage, beneficial for therapeutic antibody development.
- Achieves high recall (0.92) in identifying ABRs, making it superior to existing tools which rely heavily on online services.