Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
  • 2025-06-05

    Adapting ProteinMPNN for antibody design without retraining

    • protein design
    • generative methods
    • Novel method to bias ProteinMPNN for antibody design, without modifying model weights.
    • Logits from protein-general ProteinMPNN and antibody-specific AbLANG are added and softmaxed. Addition of AbLANG is supposed to push the model into the antibody-acceptable space.
    • On in-silico experiments ProteinMPNN+AbLang outperformed ProteinMPNN alone and rivalled antibody-specific AbMPNN.
    • Authors designed 96 variants of Trastuzumab CDR-H3 using ProteinMPNN, AbLang and ProteinMPNN+AbLang each. AbLANG and ProteinMPNN produced 1 and 3 successful variants respecitively (both out of 96) whereas their combination produced 36 successful variants.
    • None of the variants were better variants than WT Trastuzumab.
  • 2025-06-05

    AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks

    • developability
    • generative methods
    • protein design
    • Novel generative modeling framework (AbBFN2) using Bayesian Flow Networks (BFNs) for antibody sequence optimization.
    • Trains on sequences from Observed Antibody Space (OAS) combined with genetic and biophysical annotations, leveraging a denoising approach for both conditional and unconditional sequence generation. Targets include optimizing Therapeutic Antibody Profiler (TAP) annotations.
    • Computationally validated for germline assignment accuracy, species prediction (humanness), and TAP parameter optimization.
    • Combines multiple antibody design objectives into a unified, single-step optimization process, unlike existing software methods which are typically specialized for individual tasks.
  • 2025-06-05

    Learning the language of protein-protein interactions

    • language models
    • binding prediction
    • Novel LLM (MINT) that natively encapsulates protein protein interactions.
    • MINT (Multimeric INteraction Transformer) extends the ESM-2 protein language model by incorporating a cross-chain attention mechanism. This allows it to process multiple protein sequences simultaneously while preserving inter-sequence relationships and contextual information critical for modeling protein-protein interactions.
    • MINT was trained on a large, curated subset of the STRING database, consisting of 96 million high-quality physical protein-protein interactions and 16.4 million unique protein sequences. The training employed a masked language modeling objective adapted for multimeric inputs.
    • MINT was benchmarked on several general protein interaction tasks including binary interaction classification, binding affinity prediction (PDB-Bind), and mutational impact prediction (e.g., SKEMPI and MutationalPPI). It consistently outperformed existing PLMs, achieving state-of-the-art performance on multiple datasets such as a 29% improvement over baselines in SKEMPI.
    • MINT outperformed antibody-specific models (e.g., IgBert, IgT5, and AbMap) on the FLAB benchmark and SARS-CoV-2 antibody mutant binding prediction tasks. It showed >10% performance improvement on three FLAB datasets and a 14% gain in low-data settings (0.5% training data) for SARS-CoV-2 binding predictions.
    • PSBench is a large benchmark dataset (>1M models) for training and evaluating model accuracy estimation (EMA) methods for protein complex structures, using data from CASP15 & CASP16.
    • Models were generated by AlphaFold2-Multimer and AlphaFold3 under blind prediction conditions and annotated with 10 detailed global, local, and interface quality scores.
    • The dataset enables development of advanced EMA methods (e.g. GATE), which showed top performance in blind CASP16 assessments.
  • 2025-05-08

    RIOT

    • annotation/numbering
    • Fast and reliable numbering tool with an inbuilt free germline database, unifying functionalities of tools such as IgBlast, ANARCI etc.
    • It can number both amino acid and nucleotide sequences.
    • Rather than using statistical methods such as HMMs, MMSeqs-like methodology was used for rapid alignment.
    • Alignments are more accurate than existing methods, with speed improvement, running on a CPU.
  • 2025-05-08

    AntPack

    • annotation/numbering
    • Fast, alignment-based antibody numbering tool, significantly outperforming existing software in processing speed.
    • Uses a simplified global alignment with a custom scoring matrix, facilitating rapid numbering of millions of sequences efficiently.
    • Ensures accuracy comparable to established methods (ANARCI, AbNum) while numbering large-scale antibody datasets.
    • Emphasizes interpretability and robustness, providing transparent sequence scoring useful for humanization tasks.
  • 2025-05-06

    ANARCII

    • annotation/numbering
    • New version of ANARCI - using language models.
    • Employs a Seq2Seq language model eliminating the need for alignment-based numbering, thus generalizing well to novel sequences.
    • Provides numbering that matches existing methods for >99.99% conserved residues and >99.94% CDR regions.
    • Improved speed of the original HMM-based ANARCI when GPU is available.
    • Can be fine-tuned for rare immunoglobulin domains (e.g., shark VNAR sequences, T-cell receptors), offering customizable antibody numbering workflows.
  • 2025-05-06

    AbnNumPro

    • annotation/numbering
    • Offline toolkit for antibody numbering and CDR delineation (ABRs).
    • Provides an offline toolkit integrating five established antibody numbering schemes (Kabat, Chothia, IMGT, Aho, Martin).
    • Uses IMGT as the source of Germlines.
    • Allows prediction of Complementarity-Determining Regions (CDRs) and Antigen-Binding Regions (ABRs) through Hidden Markov Models (HMMs).
    • Addresses data security concerns by enabling offline usage, beneficial for therapeutic antibody development.
    • Achieves high recall (0.92) in identifying ABRs, making it superior to existing tools which rely heavily on online services.
    • Protein design method based on Boltz-1.
    • Boltz-1 is an open-source reproduction of AlphaFold3, which uses a diffusion module to co-fold molecular structures (proteins, ligands, etc.).
    • For design purposes, BoltzDesign1 sidesteps the full structure generation step and instead uses only the Pairformer (which outputs a distogram — a probabilistic representation of all pairwise residue distances). This allows broader exploration of sequence space, as it optimizes over the distribution of possible structures rather than committing to a single conformation.
    • Given a target (such as a small molecule or protein), they weakly initialize a binder sequence using random logits. This sequence is then iteratively refined by backpropagating loss through the Pairformer (and optionally through the Confidence module) to increase the predicted quality of the binder–target interaction.
    • A full 3D structure can be generated at the end using the Boltz-1 structure module, but this is not part of the optimization loop.
    • They benchmarked their method in silico on small molecule targets and a set of protein–protein interactions from the BindCraft benchmark, comparing performance to RfDiffusion All-Atom.
  • 2025-04-28

    BindCraft: one-shot design of functional protein binders

    • protein design
    • non-antibody stuff
    • BindCraft is an easy-to-use pipeline for computational protein binder design.
    • It employs AlphaFold2-Multimer to hallucinate binders via backpropagation.
    • Given a target structure and binder parameters (e.g., sequence length), the binder sequence is initialized with random logits and iteratively optimized via gradient descent through the AF2-Multimer network.
    • After binder hallucination, the sequence and surface residues are further optimized using MPNNsol, and AF2-Monomer is used to repredict and filter high-confidence designs.
    • Binder designs were validated experimentally through in vitro assays, X-ray crystallography, and cryo-EM.
    • Reported success rates ranged from 25% to 100%, with most binders in the nanomolar affinity range, a few in the micromolar range, and backbone RMSDs of ~1.7 Å to 3.1 Å between design models and solved structures.