Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
  • 2025-06-05

    Learning the language of protein-protein interactions

    • language models
    • binding prediction
    • Novel LLM (MINT) that natively encapsulates protein protein interactions.
    • MINT (Multimeric INteraction Transformer) extends the ESM-2 protein language model by incorporating a cross-chain attention mechanism. This allows it to process multiple protein sequences simultaneously while preserving inter-sequence relationships and contextual information critical for modeling protein-protein interactions.
    • MINT was trained on a large, curated subset of the STRING database, consisting of 96 million high-quality physical protein-protein interactions and 16.4 million unique protein sequences. The training employed a masked language modeling objective adapted for multimeric inputs.
    • MINT was benchmarked on several general protein interaction tasks including binary interaction classification, binding affinity prediction (PDB-Bind), and mutational impact prediction (e.g., SKEMPI and MutationalPPI). It consistently outperformed existing PLMs, achieving state-of-the-art performance on multiple datasets such as a 29% improvement over baselines in SKEMPI.
    • MINT outperformed antibody-specific models (e.g., IgBert, IgT5, and AbMap) on the FLAB benchmark and SARS-CoV-2 antibody mutant binding prediction tasks. It showed >10% performance improvement on three FLAB datasets and a 14% gain in low-data settings (0.5% training data) for SARS-CoV-2 binding predictions.
  • 2025-06-05

    AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks

    • developability
    • generative methods
    • protein design
    • Novel generative modeling framework (AbBFN2) using Bayesian Flow Networks (BFNs) for antibody sequence optimization.
    • Trains on sequences from Observed Antibody Space (OAS) combined with genetic and biophysical annotations, leveraging a denoising approach for both conditional and unconditional sequence generation. Targets include optimizing Therapeutic Antibody Profiler (TAP) annotations.
    • Computationally validated for germline assignment accuracy, species prediction (humanness), and TAP parameter optimization.
    • Combines multiple antibody design objectives into a unified, single-step optimization process, unlike existing software methods which are typically specialized for individual tasks.
  • 2025-06-05

    Adapting ProteinMPNN for antibody design without retraining

    • protein design
    • generative methods
    • Novel method to bias ProteinMPNN for antibody design, without modifying model weights.
    • Logits from protein-general ProteinMPNN and antibody-specific AbLANG are added and softmaxed. Addition of AbLANG is supposed to push the model into the antibody-acceptable space.
    • On in-silico experiments ProteinMPNN+AbLang outperformed ProteinMPNN alone and rivalled antibody-specific AbMPNN.
    • Authors designed 96 variants of Trastuzumab CDR-H3 using ProteinMPNN, AbLang and ProteinMPNN+AbLang each. AbLANG and ProteinMPNN produced 1 and 3 successful variants respecitively (both out of 96) whereas their combination produced 36 successful variants.
    • None of the variants were better variants than WT Trastuzumab.
    • PSBench is a large benchmark dataset (>1M models) for training and evaluating model accuracy estimation (EMA) methods for protein complex structures, using data from CASP15 & CASP16.
    • Models were generated by AlphaFold2-Multimer and AlphaFold3 under blind prediction conditions and annotated with 10 detailed global, local, and interface quality scores.
    • The dataset enables development of advanced EMA methods (e.g. GATE), which showed top performance in blind CASP16 assessments.
  • 2025-05-08

    RIOT

    • annotation/numbering
    • Fast and reliable numbering tool with an inbuilt free germline database, unifying functionalities of tools such as IgBlast, ANARCI etc.
    • It can number both amino acid and nucleotide sequences.
    • Rather than using statistical methods such as HMMs, MMSeqs-like methodology was used for rapid alignment.
    • Alignments are more accurate than existing methods, with speed improvement, running on a CPU.
  • 2025-05-08

    AntPack

    • annotation/numbering
    • Fast, alignment-based antibody numbering tool, significantly outperforming existing software in processing speed.
    • Uses a simplified global alignment with a custom scoring matrix, facilitating rapid numbering of millions of sequences efficiently.
    • Ensures accuracy comparable to established methods (ANARCI, AbNum) while numbering large-scale antibody datasets.
    • Emphasizes interpretability and robustness, providing transparent sequence scoring useful for humanization tasks.
  • 2025-05-06

    ANARCII

    • annotation/numbering
    • New version of ANARCI - using language models.
    • Employs a Seq2Seq language model eliminating the need for alignment-based numbering, thus generalizing well to novel sequences.
    • Provides numbering that matches existing methods for >99.99% conserved residues and >99.94% CDR regions.
    • Improved speed of the original HMM-based ANARCI when GPU is available.
    • Can be fine-tuned for rare immunoglobulin domains (e.g., shark VNAR sequences, T-cell receptors), offering customizable antibody numbering workflows.
  • 2025-05-06

    AbnNumPro

    • annotation/numbering
    • Offline toolkit for antibody numbering and CDR delineation (ABRs).
    • Provides an offline toolkit integrating five established antibody numbering schemes (Kabat, Chothia, IMGT, Aho, Martin).
    • Uses IMGT as the source of Germlines.
    • Allows prediction of Complementarity-Determining Regions (CDRs) and Antigen-Binding Regions (ABRs) through Hidden Markov Models (HMMs).
    • Addresses data security concerns by enabling offline usage, beneficial for therapeutic antibody development.
    • Achieves high recall (0.92) in identifying ABRs, making it superior to existing tools which rely heavily on online services.
    • Novel protein generative language model — ProGen3
    • The model can do autoregressive generation N-to-C, C-to-N, and also supports span infilling.
    • The architecture is a Transformer with a Sparse Mixture of Experts (MoE), activating about 27% of parameters per forward pass to improve computational efficiency.
    • They studied how sampling affects training by trying different family-level weighting schemes. Uniform sampling across families (where small and large families have equal chance) gave better diversity and generalization, while unmodified sampling (letting big families dominate) performed worst.
    • They validated the models by showing that generated proteins express well in wet lab experiments (split-GFP assays, spanning both highly novel and moderately novel sequence spaces).
    • They used a large thermostability dataset to align model predictions to stability. This alignment is not standard fine-tuning — instead, preference optimization was applied, teaching the model to prefer sequences predicted to have higher stability. Upon experimental validation, aligned models indeed produced proteins with higher expression and stability.
  • 2025-04-28

    Atom level enzyme active site scaffolding using RFdiffusion2

    • protein design
    • non-antibody stuff
    • Improvement upon earlier RFDiffusion, enhancing stability and accuracy in designing enzyme active sites.
    • Catalytic sites can now be specified at the atomic level instead of the residue backbone level used previously. This eliminates the need to explicitly enumerate side-chain rotamers.
    • Training uses flow matching, a technique that simplifies and stabilizes the diffusion training process.
    • Benchmarked on a set of 41 diverse enzyme active sites; RFdiffusion2 succeeded in all 41 cases, significantly outperforming the earlier RFDiffusion, which succeeded in only 16.