Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Combined in vitro/in silico method for optimization of binders.
    • Start from a wild-type scFv (heavy chain), build a random-mutant library, FACS-sort on multiple antigens, deep-sequence bins + input, and use per-sequence enrichment (bin/library) as the supervised target for (antibody, antigen) training pairs.
    • Train uncertainty-aware regressors (xGPR or ByteNet-SNGP) on those enrichment targets; run in-silico directed evolution (ISDE) from the WT, proposing single mutations and auto-rejecting moves with high predictive uncertainty while optimizing the worst-case score across antigens.
    • Binding is protected by the multi-antigen objective + uncertainty gating during ISDE; risky proposals are discarded before they enter the candidate set.
    • Filter candidates for humanness with SAM/AntPack and for solubility with CamSol v2.2 (framework is extensible to add other gates); final wet-lab set kept 29 designs after applying these filters and uncertainty checks.
    • Beyond large in-silico tests, yeast-display across 10 SARS-CoV-2 RBDs shows most designs outperform WT; a representative clone (Delta-63) improves KD on 8/10 variants and competes with ACE2.
  • 2025-09-12

    Tokenizing Loops of Antibodies

    • structure prediction
    • generative methods
    • Novel model for loop retrieval using embedded structural representation.
    • It is a multimodal tokenizer at the antibody loop (CDR) level that fuses sequence with backbone dihedral-angle features and learns a latent space with a dihedral-distance contrastive loss—unlike residue-tokenizers and canonical clusters. It produces both continuous and quantized loop tokens that can plug into PLMs (IGLOOLM / IGLOOALM).
    • Trained by self-supervised on ~807k loops from experimental (SAbDab/STCRDab) and Ibex-predicted structures, with four objectives: masked dihedral reconstruction, masked AA prediction, contrastive learning over dihedral distance (with DTW alignment), and codebook learning; followed by two-phase training and specific H100 settings.
    • It was benchmarked on a set of computational goals: for H3 loops IGLOO beats the best prior tokenizer by +5.9% (dihedral-distance criterion). (2) Cluster recovery: high purity vs. canonical clusters across CDRs. (3) Downstream PLM task: IGLOOLM improves binding-affinity prediction on 8/10 AbBiBench targets, rivaling larger models. (4) Controllable sampling: IGLOOALM generates diverse sequences with more structure consistency than inverse-folding baselines.
    • Novel method to assess antibody immunogenicity.
    • Created two reference libraries: a positive set from human proteins and antibodies (OAS + proteome) and a negative set from murine antibody sequences (OAS).
    • Antibody sequences are fragmented into 8–12-mer peptides.
    • Peptide fragments are scored: +1.0 if matching the positive reference, −0.2 if matching the negative reference.
    • Validated on 217 therapeutic antibodies with known clinical ADA incidence, showing strong negative correlation between hit rate and ADA.
    • On 25 humanized antibody pairs, ImmunoSeq correctly predicted reduced immunogenicity after humanization, consistent with experimental results.
    • Benchmarking of docking/complex prediction methods for antibody-antigen (Ab-Ag) complexes.
    • Authors used 200 antibody-antigen and nanobody-antigen complexes curated from prior studies, specifically chosen to exclude any complexes present in the training data of the evaluated models.
    • Evaluated methods: AF2 (v2.3.2), Protenix, ESMFold, Chai-1, Boltz-1, Boltz-1x, and Boltz-2. (Note: Boltz-2 was only tested on 18 complexes; Protenix failed on 26 large complexes.)
    • DockQ and CAPRI criteria were used as primary metrics to assess structural prediction quality.
    • AF2 performed best overall, especially for antibody-antigen complexes. Chai-1 outperformed AF2 on nanobody-antigen complexes.
    • A composite confidence metric, AntiConf, was introduced, combining pTM and pDockQ2 scores to better assess the quality of Ab-Ag models. AntiConf = 0.3 × pDockQ2 + 0.7 × pTM
  • 2025-09-05

    MD-LLM-1: A Large Language Model for Molecular Dynamics

    • non-antibody stuff
    • language models
    • Demonstration showing how large language models (LLMs) can be adapted to reduce the computational cost of molecular dynamics (MD).
    • They use the FoldToken encoding to discretize protein 3D conformations into tokens compatible with Mistral, and fine-tune the LLM on short MD trajectories of a single state. The model is then able to generate new sequences of conformations by predicting the next frame from previous frames.
    • After fine-tuning, the model can extend trajectories beyond the training data. Starting from a native state, it can discover alternative conformations potential for bypassing kinetic barriers that normally require long MD runs.
    • The approach is system-specific (requires an MD trajectory for each protein), does not yet encode thermodynamics/kinetics explicitly, and relies on the choice of structural tokenization.
    • Introduces AbSet, a curated dataset of >800,000 antibody structures, combining experimental PDB entries with in silico–generated antibody–antigen complexes.
    • Adds value beyond SAbDab by standardizing structures, including decoy poses, and providing residue-level molecular descriptors for machine learning.
    • Presents dataset profiling and validation, with analyses of structural resolution, antigen diversity, docking quality classification, and descriptor calculation efficiency.
    • Introduces a novel diffusion-based inverse folding method (RL-DIF) that improves the foldable diversity of generated sequences—i.e., it can generate more diverse sequences that still fold into the desired structure.
    • The model uses categorical denoising diffusion for sequence generation, followed by reinforcement learning (DDPO) to improve structural consistency with the target fold.
    • During reinforcement learning, ESMFold is used to predict the 3D structure of generated sequences, which is then compared (via TM-score) to the structure predicted from the native sequence to ensure they fold similarly.
    • Compared to baselines like PiFold and ProteinMPNN, RL-DIF achieves similar sequence recovery and structural consistency but significantly better foldable diversity—a critical advantage in protein design.
    • Novel protein language model with applications to epitope prediction and ranking hits in campaigns.
    • NextGenPLM introduces a modular, multimodal transformer that fuses frozen pretrained protein language models with structural information via spectral contact-map embeddings, enabling efficient modeling of multi-chain antibody–antigen complexes without requiring full 3D folding of antibodies.
    • The model was benchmarked on 112 diverse antibody–antigen complexes against state-of-the-art structure predictors (Chai-1 and Boltz-1x), matching their contact-map and epitope prediction accuracy while achieving ~100× higher throughput (4 complexes/sec vs. ~1 min/complex).
    • The model was experimentally validated through an internal affinity-maturation campaign. Using its predictions to rank antibody variants led to designs that achieved up to 17× binding affinity improvements over the wild-type, as confirmed by surface plasmon resonance (SPR) assays.
    • AntiDIF, a diffusion-based inverse folding method specialized for antibodies, built on the RL-DIF framework.
    • It is trained using antibody-specific data (from SAbDab and OAS) to generate diverse and accurate antibody sequences for a given backbone structure.
    • Unlike prior methods like AntiFold, which trade off diversity for recovery, AntiDIF achieves a better trade-off: it produces substantially higher sequence diversity across CDRs while maintaining comparable or higher sequence recovery.
    • Forward folding (via ABodyBuilder2) confirms that AntiDIF's sequences fold into structures that match the native antibody backbones with low RMSD, demonstrating structural plausibility.
    • Mutational analysis of Trastuzumab framework (FW) regions to modulate antibody stability and function, moving beyond the traditional focus on CDRs.
    • Authors evaluated antibody-specific language models (AbLang2, AntiBERTy, etc.), a general protein language model (ESM-2), and a structure-based Rosetta approach. While the language models showed limited utility in suggesting beneficial FW mutations, Rosetta provided more reliable predictions based on structural stability.
    • Language model-derived mutation suggestions were generally less informative than Rosetta’s, which successfully identified stabilizing FW mutations not biased toward germline residues.
    • Authors experimentally characterized selected mutants in vitro, assessing thermostability, antigen (HER2) binding, and functional effects such as ADCC and tumor cell viability. Some mutations preserved function, while others decoupled binding from downstream activity.