Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Review of currently available large scale software for antibody analysis.
    • Today’s biologics R&D is slowed by fragmented tools and manual data wrangling; the paper proposes a unified, open-architecture platform that spans registration, tracking, analysis, and decisions from discovery through developability.
    • Key components are end-to-end registration of molecules/materials/assays; a harmonized data schema with normalized outputs; automated analytics with consistent QC; complete metadata capture and “data integrity by design.”
    • The platform should natively interface with AI, enable multimodal foundation models and continuous “lab-in-the-loop” learning, and support federated approaches to counter data scarcity while preserving privacy.
    • Dotmatics, Genedata, and Schrödinger each cover pieces (e.g., LiveDesign lacks end-to-end registration), and the authors stress regulatory-ready features.
  • 2025-09-30

    A Generative Foundation Model for Antibody Design

    • generative methods
    • protein design
    • Novel de novo antibody design method.
    • Trained on SAbDab with a time split—6,448 heavy+light complexes + 1,907 single-chain (nanobodies), clustered at 95% ID into 2,436 clusters; val/test are 101 and 60 complexes, plus 27 nanobodies.
    • A two-stage diffusion (structure→seq+structure) followed by consistency distillation, epitope-aware conditioning, frozen ESM-PPI features, and mixed task sampling (CDR-H3 / heavy CDRs / all CDRs / no seq).
    • Antigen structure (can warm-start from AlphaFold3) + VH/VL framework sequences; you pick which CDRs (and lengths) to design; model outputs CDR sequences and the full complex.
    • Runs without an epitope but docking drops (DockQ ~0.246 → 0.069, SR 0.433 → 0.050); AF3 initialization lifts success to 0.627 (≈+0.19 vs baseline).
  • 2025-09-30

    Efficient generation of epitope-targeted de novo antibodies with Germinal

    • generative methods
    • nanobodies
    • protein design
    • Novel open nanobody design method with experimental validation.
    • On the surface it might appear like a lot of methods stitched together. The magic sauce appears to be in the joint, gradient-based co-optimization: AF-Multimer and IgLM gradients are merged through a 3-phase schedule (logits → softmax → semi-greedy), with CDR-masking/framework bias and custom losses that force CDR-mediated, loop-like interfaces; then AbMPNN edits only non-contact CDR residues, and designs are filtered independently with AF3 + PyRosetta.
    • All this is actually not a ‘trained’ model but rather a filtering pipeline that WAS NOT trained (using previous methods, gradients, weights etc.) Just validated experimentally.
    • Experimental benchmark was ran on four targets: PD-L1, IL-3, IL-20, and BHRF1.
    • Authors measured how different their designs weren’t just ‘regurgitations’ of known abs. CDR identities were computed against SAbDab and OAS (via MMseqs); many designs show <50% CDR identity to any public sequence.
  • 2025-09-30

    mBER: Controllable de novo antibody design with million-scale experimental screening

    • binding prediction
    • generative methods
    • protein design
    • experimental techniques
    • Novel de novo antibody design method with massive experimental testing.
    • The computational method involves integration, not retraining, of existing tools. It combines AlphaFold-Multimer, protein language models (ESM2/AbLang2), and NanoBodyBuilder2 with templating/sequence priors to design/filter antibody-format binders.
    • They perform massive testing. >1.1 million VHH binders designed across 436 targets (145 tested); ~330k experimentally screened.
    • Hit rates look low per binder (~0.5–1%) but that’s ~50× above random libraries, and still yields thousands of validated binders.
    • Target-level success is 45%, for how many targets we got binders; some epitopes reached 30–38% hit rates after filtering.
    • The big caveat is the specificity of epitopes- it really makes a difference, with some epitopes producing nought.
    • Novel model to predict the heavy/light chain compatibility
    • Data: H/L with the same single-cell barcode; negatives = swap L chains between pairs but only if CDRL3 length matches; balanced set of 233,880 pairs with a 90/10 train–test split.
    • Training: Full VH+VL into AntiBERTa2 with a classification head; fine-tuned 3 epochs, lr 2×10⁻⁵, weight decay 0.01; κ/λ-specific variants trained identically. Final AUC-ROC 0.75 (withheld) and 0.66 (external); κ/λ models: 0.885/0.831.
    • Baselines: (i) V/J gene-usage → logistic reg. & XGBoost ≈ 0.50–0.52 acc.; (ii) CDRH3+CDRL3 CNNs → moderate; (iii) ESM-2 improves with fine-tuning but AntiBERTa2 FT is best.
    • It seems to do better than just ‘matching to the database’. Weak gene-usage baselines, explicit control of CDRL3 length in negatives, external generalisation, and sensitivity to interface residues (CDRH1/2 & framework) in therapeutic-antibody tests argue the model learns sequence-level pairing rules, not just V/L distributions.
    • Introduces TNP, a nanobody-specific developability profiler inspired by TAP.
    • Uses six metrics: total CDR length, CDR3 length, CDR3 compactness, and patch scores for hydrophobicity, positive charge, and negative charge.
    • Thresholds are calibrated to 36 clinical-stage nanobodies.
    • In vitro assays on 108 nanobodies (36 clinical-stage + 72 proprietary) show partial agreement with TNP flags, indicating complementary—but not perfectly correlated—assessments.
    • Combined in vitro/in silico method for optimization of binders.
    • Start from a wild-type scFv (heavy chain), build a random-mutant library, FACS-sort on multiple antigens, deep-sequence bins + input, and use per-sequence enrichment (bin/library) as the supervised target for (antibody, antigen) training pairs.
    • Train uncertainty-aware regressors (xGPR or ByteNet-SNGP) on those enrichment targets; run in-silico directed evolution (ISDE) from the WT, proposing single mutations and auto-rejecting moves with high predictive uncertainty while optimizing the worst-case score across antigens.
    • Binding is protected by the multi-antigen objective + uncertainty gating during ISDE; risky proposals are discarded before they enter the candidate set.
    • Filter candidates for humanness with SAM/AntPack and for solubility with CamSol v2.2 (framework is extensible to add other gates); final wet-lab set kept 29 designs after applying these filters and uncertainty checks.
    • Beyond large in-silico tests, yeast-display across 10 SARS-CoV-2 RBDs shows most designs outperform WT; a representative clone (Delta-63) improves KD on 8/10 variants and competes with ACE2.
  • 2025-09-12

    Tokenizing Loops of Antibodies

    • structure prediction
    • generative methods
    • Novel model for loop retrieval using embedded structural representation.
    • It is a multimodal tokenizer at the antibody loop (CDR) level that fuses sequence with backbone dihedral-angle features and learns a latent space with a dihedral-distance contrastive loss—unlike residue-tokenizers and canonical clusters. It produces both continuous and quantized loop tokens that can plug into PLMs (IGLOOLM / IGLOOALM).
    • Trained by self-supervised on ~807k loops from experimental (SAbDab/STCRDab) and Ibex-predicted structures, with four objectives: masked dihedral reconstruction, masked AA prediction, contrastive learning over dihedral distance (with DTW alignment), and codebook learning; followed by two-phase training and specific H100 settings.
    • It was benchmarked on a set of computational goals: for H3 loops IGLOO beats the best prior tokenizer by +5.9% (dihedral-distance criterion). (2) Cluster recovery: high purity vs. canonical clusters across CDRs. (3) Downstream PLM task: IGLOOLM improves binding-affinity prediction on 8/10 AbBiBench targets, rivaling larger models. (4) Controllable sampling: IGLOOALM generates diverse sequences with more structure consistency than inverse-folding baselines.
    • Novel method to assess antibody immunogenicity.
    • Created two reference libraries: a positive set from human proteins and antibodies (OAS + proteome) and a negative set from murine antibody sequences (OAS).
    • Antibody sequences are fragmented into 8–12-mer peptides.
    • Peptide fragments are scored: +1.0 if matching the positive reference, −0.2 if matching the negative reference.
    • Validated on 217 therapeutic antibodies with known clinical ADA incidence, showing strong negative correlation between hit rate and ADA.
    • On 25 humanized antibody pairs, ImmunoSeq correctly predicted reduced immunogenicity after humanization, consistent with experimental results.
    • Benchmarking of docking/complex prediction methods for antibody-antigen (Ab-Ag) complexes.
    • Authors used 200 antibody-antigen and nanobody-antigen complexes curated from prior studies, specifically chosen to exclude any complexes present in the training data of the evaluated models.
    • Evaluated methods: AF2 (v2.3.2), Protenix, ESMFold, Chai-1, Boltz-1, Boltz-1x, and Boltz-2. (Note: Boltz-2 was only tested on 18 complexes; Protenix failed on 26 large complexes.)
    • DockQ and CAPRI criteria were used as primary metrics to assess structural prediction quality.
    • AF2 performed best overall, especially for antibody-antigen complexes. Chai-1 outperformed AF2 on nanobody-antigen complexes.
    • A composite confidence metric, AntiConf, was introduced, combining pTM and pDockQ2 scores to better assess the quality of Ab-Ag models. AntiConf = 0.3 × pDockQ2 + 0.7 × pTM