Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Novel model to predict the heavy/light chain compatibility
    • Data: H/L with the same single-cell barcode; negatives = swap L chains between pairs but only if CDRL3 length matches; balanced set of 233,880 pairs with a 90/10 train–test split.
    • Training: Full VH+VL into AntiBERTa2 with a classification head; fine-tuned 3 epochs, lr 2×10⁻⁵, weight decay 0.01; κ/λ-specific variants trained identically. Final AUC-ROC 0.75 (withheld) and 0.66 (external); κ/λ models: 0.885/0.831.
    • Baselines: (i) V/J gene-usage → logistic reg. & XGBoost ≈ 0.50–0.52 acc.; (ii) CDRH3+CDRL3 CNNs → moderate; (iii) ESM-2 improves with fine-tuning but AntiBERTa2 FT is best.
    • It seems to do better than just ‘matching to the database’. Weak gene-usage baselines, explicit control of CDRL3 length in negatives, external generalisation, and sensitivity to interface residues (CDRH1/2 & framework) in therapeutic-antibody tests argue the model learns sequence-level pairing rules, not just V/L distributions.
    • Introduces TNP, a nanobody-specific developability profiler inspired by TAP.
    • Uses six metrics: total CDR length, CDR3 length, CDR3 compactness, and patch scores for hydrophobicity, positive charge, and negative charge.
    • Thresholds are calibrated to 36 clinical-stage nanobodies.
    • In vitro assays on 108 nanobodies (36 clinical-stage + 72 proprietary) show partial agreement with TNP flags, indicating complementary—but not perfectly correlated—assessments.
    • Combined in vitro/in silico method for optimization of binders.
    • Start from a wild-type scFv (heavy chain), build a random-mutant library, FACS-sort on multiple antigens, deep-sequence bins + input, and use per-sequence enrichment (bin/library) as the supervised target for (antibody, antigen) training pairs.
    • Train uncertainty-aware regressors (xGPR or ByteNet-SNGP) on those enrichment targets; run in-silico directed evolution (ISDE) from the WT, proposing single mutations and auto-rejecting moves with high predictive uncertainty while optimizing the worst-case score across antigens.
    • Binding is protected by the multi-antigen objective + uncertainty gating during ISDE; risky proposals are discarded before they enter the candidate set.
    • Filter candidates for humanness with SAM/AntPack and for solubility with CamSol v2.2 (framework is extensible to add other gates); final wet-lab set kept 29 designs after applying these filters and uncertainty checks.
    • Beyond large in-silico tests, yeast-display across 10 SARS-CoV-2 RBDs shows most designs outperform WT; a representative clone (Delta-63) improves KD on 8/10 variants and competes with ACE2.
  • 2025-09-12

    Tokenizing Loops of Antibodies

    • structure prediction
    • generative methods
    • Novel model for loop retrieval using embedded structural representation.
    • It is a multimodal tokenizer at the antibody loop (CDR) level that fuses sequence with backbone dihedral-angle features and learns a latent space with a dihedral-distance contrastive loss—unlike residue-tokenizers and canonical clusters. It produces both continuous and quantized loop tokens that can plug into PLMs (IGLOOLM / IGLOOALM).
    • Trained by self-supervised on ~807k loops from experimental (SAbDab/STCRDab) and Ibex-predicted structures, with four objectives: masked dihedral reconstruction, masked AA prediction, contrastive learning over dihedral distance (with DTW alignment), and codebook learning; followed by two-phase training and specific H100 settings.
    • It was benchmarked on a set of computational goals: for H3 loops IGLOO beats the best prior tokenizer by +5.9% (dihedral-distance criterion). (2) Cluster recovery: high purity vs. canonical clusters across CDRs. (3) Downstream PLM task: IGLOOLM improves binding-affinity prediction on 8/10 AbBiBench targets, rivaling larger models. (4) Controllable sampling: IGLOOALM generates diverse sequences with more structure consistency than inverse-folding baselines.
    • Novel method to assess antibody immunogenicity.
    • Created two reference libraries: a positive set from human proteins and antibodies (OAS + proteome) and a negative set from murine antibody sequences (OAS).
    • Antibody sequences are fragmented into 8–12-mer peptides.
    • Peptide fragments are scored: +1.0 if matching the positive reference, −0.2 if matching the negative reference.
    • Validated on 217 therapeutic antibodies with known clinical ADA incidence, showing strong negative correlation between hit rate and ADA.
    • On 25 humanized antibody pairs, ImmunoSeq correctly predicted reduced immunogenicity after humanization, consistent with experimental results.
    • Benchmarking of docking/complex prediction methods for antibody-antigen (Ab-Ag) complexes.
    • Authors used 200 antibody-antigen and nanobody-antigen complexes curated from prior studies, specifically chosen to exclude any complexes present in the training data of the evaluated models.
    • Evaluated methods: AF2 (v2.3.2), Protenix, ESMFold, Chai-1, Boltz-1, Boltz-1x, and Boltz-2. (Note: Boltz-2 was only tested on 18 complexes; Protenix failed on 26 large complexes.)
    • DockQ and CAPRI criteria were used as primary metrics to assess structural prediction quality.
    • AF2 performed best overall, especially for antibody-antigen complexes. Chai-1 outperformed AF2 on nanobody-antigen complexes.
    • A composite confidence metric, AntiConf, was introduced, combining pTM and pDockQ2 scores to better assess the quality of Ab-Ag models. AntiConf = 0.3 × pDockQ2 + 0.7 × pTM
  • 2025-09-05

    MD-LLM-1: A Large Language Model for Molecular Dynamics

    • non-antibody stuff
    • language models
    • Demonstration showing how large language models (LLMs) can be adapted to reduce the computational cost of molecular dynamics (MD).
    • They use the FoldToken encoding to discretize protein 3D conformations into tokens compatible with Mistral, and fine-tune the LLM on short MD trajectories of a single state. The model is then able to generate new sequences of conformations by predicting the next frame from previous frames.
    • After fine-tuning, the model can extend trajectories beyond the training data. Starting from a native state, it can discover alternative conformations potential for bypassing kinetic barriers that normally require long MD runs.
    • The approach is system-specific (requires an MD trajectory for each protein), does not yet encode thermodynamics/kinetics explicitly, and relies on the choice of structural tokenization.
    • Introduces AbSet, a curated dataset of >800,000 antibody structures, combining experimental PDB entries with in silico–generated antibody–antigen complexes.
    • Adds value beyond SAbDab by standardizing structures, including decoy poses, and providing residue-level molecular descriptors for machine learning.
    • Presents dataset profiling and validation, with analyses of structural resolution, antigen diversity, docking quality classification, and descriptor calculation efficiency.
    • Introduces a novel diffusion-based inverse folding method (RL-DIF) that improves the foldable diversity of generated sequences—i.e., it can generate more diverse sequences that still fold into the desired structure.
    • The model uses categorical denoising diffusion for sequence generation, followed by reinforcement learning (DDPO) to improve structural consistency with the target fold.
    • During reinforcement learning, ESMFold is used to predict the 3D structure of generated sequences, which is then compared (via TM-score) to the structure predicted from the native sequence to ensure they fold similarly.
    • Compared to baselines like PiFold and ProteinMPNN, RL-DIF achieves similar sequence recovery and structural consistency but significantly better foldable diversity—a critical advantage in protein design.
    • Novel protein language model with applications to epitope prediction and ranking hits in campaigns.
    • NextGenPLM introduces a modular, multimodal transformer that fuses frozen pretrained protein language models with structural information via spectral contact-map embeddings, enabling efficient modeling of multi-chain antibody–antigen complexes without requiring full 3D folding of antibodies.
    • The model was benchmarked on 112 diverse antibody–antigen complexes against state-of-the-art structure predictors (Chai-1 and Boltz-1x), matching their contact-map and epitope prediction accuracy while achieving ~100× higher throughput (4 complexes/sec vs. ~1 min/complex).
    • The model was experimentally validated through an internal affinity-maturation campaign. Using its predictions to rank antibody variants led to designs that achieved up to 17× binding affinity improvements over the wild-type, as confirmed by surface plasmon resonance (SPR) assays.