A novel method that repurposes AlphaFold-2.3 structure predictions and combines them with inverse folding–based machine learning models to assess antibody-antigen binding accuracy and specificity.
They generate antibody-antigen complex models using AlphaFold-2.3 and evaluate them using the 'AbAgIoU' metric, which measures the overlap between predicted and true epitope/paratope residues — penalizing both missing and extra contacts.
They demonstrate that the learned scores can distinguish true from incorrect antibody-antigen pairings (including swapped antibody scenarios), significantly outperforming random baselines.
The method relies only on antibody and antigen sequences as input, using AlphaFold to model structures — making it applicable in real-world settings where experimental structures are unavailable.
Novel method to design antibodies based on boltz-1.
They added a sequence head to boltz-1 to perform simultaneous sequence/structure co-design.
They employed data from SAbDab to fine-tune boltz-1 on antibody-antigen complexes.
They compared to dyMEAN and DiffAB looking at amino acid recovery, RMSD and Rosetta InterfaceAnalyzer energy - their model does better on these computational benchmarks.
Introduced a novel machine learning method (NanoBinder) to predict the binding probability of nanobody-antigen structural complexes.
Positive (binding) complexes were sourced from the SAbDab database, which contains experimentally validated nanobody-antigen interactions.
Negative (non-binding) complexes were generated by structurally aligning nanobodies from different binding complexes (with RMSD < 2 Å) and recombining them with unrelated antigens to create likely non-binding pairs.
Extracted Rosetta energy features from each complex and trained several machine learning models, including Random Forests, SVMs, AdaBoost, and Decision Trees, to classify binders vs. non-binders. Random Forests showed the best performance.
They selected antibodies with known antigen targets (e.g., IL-6) and grafted their CDRs onto nanobody scaffolds using Rosetta-based protocols. The resulting nanobody-antigen complexes were evaluated in silico using NanoBinder, and selected candidates were experimentally validated. The predictions showed good correlation with binding outcomes, particularly for identifying non-binders.
Novel LLM (MINT) that natively encapsulates protein protein interactions.
MINT (Multimeric INteraction Transformer) extends the ESM-2 protein language model by incorporating a cross-chain attention mechanism. This allows it to process multiple protein sequences simultaneously while preserving inter-sequence relationships and contextual information critical for modeling protein-protein interactions.
MINT was trained on a large, curated subset of the STRING database, consisting of 96 million high-quality physical protein-protein interactions and 16.4 million unique protein sequences. The training employed a masked language modeling objective adapted for multimeric inputs.
MINT was benchmarked on several general protein interaction tasks including binary interaction classification, binding affinity prediction (PDB-Bind), and mutational impact prediction (e.g., SKEMPI and MutationalPPI). It consistently outperformed existing PLMs, achieving state-of-the-art performance on multiple datasets such as a 29% improvement over baselines in SKEMPI.
MINT outperformed antibody-specific models (e.g., IgBert, IgT5, and AbMap) on the FLAB benchmark and SARS-CoV-2 antibody mutant binding prediction tasks. It showed >10% performance improvement on three FLAB datasets and a 14% gain in low-data settings (0.5% training data) for SARS-CoV-2 binding predictions.
Novel generative modeling framework (AbBFN2) using Bayesian Flow Networks (BFNs) for antibody sequence optimization.
Trains on sequences from Observed Antibody Space (OAS) combined with genetic and biophysical annotations, leveraging a denoising approach for both conditional and unconditional sequence generation. Targets include optimizing Therapeutic Antibody Profiler (TAP) annotations.
Computationally validated for germline assignment accuracy, species prediction (humanness), and TAP parameter optimization.
Combines multiple antibody design objectives into a unified, single-step optimization process, unlike existing software methods which are typically specialized for individual tasks.
Novel method to bias ProteinMPNN for antibody design, without modifying model weights.
Logits from protein-general ProteinMPNN and antibody-specific AbLANG are added and softmaxed. Addition of AbLANG is supposed to push the model into the antibody-acceptable space.
On in-silico experiments ProteinMPNN+AbLang outperformed ProteinMPNN alone and rivalled antibody-specific AbMPNN.
Authors designed 96 variants of Trastuzumab CDR-H3 using ProteinMPNN, AbLang and ProteinMPNN+AbLang each. AbLANG and ProteinMPNN produced 1 and 3 successful variants respecitively (both out of 96) whereas their combination produced 36 successful variants.
None of the variants were better variants than WT Trastuzumab.
PSBench is a large benchmark dataset (>1M models) for training and evaluating model accuracy estimation (EMA) methods for protein complex structures, using data from CASP15 & CASP16.
Models were generated by AlphaFold2-Multimer and AlphaFold3 under blind prediction conditions and annotated with 10 detailed global, local, and interface quality scores.
The dataset enables development of advanced EMA methods (e.g. GATE), which showed top performance in blind CASP16 assessments.