Novel open nanobody design method with experimental validation.
On the surface it might appear like a lot of methods stitched together. The magic sauce appears to be in the joint, gradient-based co-optimization: AF-Multimer and IgLM gradients are merged through a 3-phase schedule (logits → softmax → semi-greedy), with CDR-masking/framework bias and custom losses that force CDR-mediated, loop-like interfaces; then AbMPNN edits only non-contact CDR residues, and designs are filtered independently with AF3 + PyRosetta.
All this is actually not a ‘trained’ model but rather a filtering pipeline that WAS NOT trained (using previous methods, gradients, weights etc.) Just validated experimentally.
Experimental benchmark was ran on four targets: PD-L1, IL-3, IL-20, and BHRF1.
Authors measured how different their designs weren’t just ‘regurgitations’ of known abs. CDR identities were computed against SAbDab and OAS (via MMseqs); many designs show <50% CDR identity to any public sequence.
Novel de novo antibody design method with massive experimental testing.
The computational method involves integration, not retraining, of existing tools. It combines AlphaFold-Multimer, protein language models (ESM2/AbLang2), and NanoBodyBuilder2 with templating/sequence priors to design/filter antibody-format binders.
They perform massive testing. >1.1 million VHH binders designed across 436 targets (145 tested); ~330k experimentally screened.
Hit rates look low per binder (~0.5–1%) but that’s ~50× above random libraries, and still yields thousands of validated binders.
Target-level success is 45%, for how many targets we got binders; some epitopes reached 30–38% hit rates after filtering.
The big caveat is the specificity of epitopes- it really makes a difference, with some epitopes producing nought.
Novel model to predict the heavy/light chain compatibility
Data: H/L with the same single-cell barcode; negatives = swap L chains between pairs but only if CDRL3 length matches; balanced set of 233,880 pairs with a 90/10 train–test split.
Training: Full VH+VL into AntiBERTa2 with a classification head; fine-tuned 3 epochs, lr 2×10⁻⁵, weight decay 0.01; κ/λ-specific variants trained identically. Final AUC-ROC 0.75 (withheld) and 0.66 (external); κ/λ models: 0.885/0.831.
Baselines: (i) V/J gene-usage → logistic reg. & XGBoost ≈ 0.50–0.52 acc.; (ii) CDRH3+CDRL3 CNNs → moderate; (iii) ESM-2 improves with fine-tuning but AntiBERTa2 FT is best.
It seems to do better than just ‘matching to the database’. Weak gene-usage baselines, explicit control of CDRL3 length in negatives, external generalisation, and sensitivity to interface residues (CDRH1/2 & framework) in therapeutic-antibody tests argue the model learns sequence-level pairing rules, not just V/L distributions.
Introduces TNP, a nanobody-specific developability profiler inspired by TAP.
Uses six metrics: total CDR length, CDR3 length, CDR3 compactness, and patch scores for hydrophobicity, positive charge, and negative charge.
Thresholds are calibrated to 36 clinical-stage nanobodies.
In vitro assays on 108 nanobodies (36 clinical-stage + 72 proprietary) show partial agreement with TNP flags, indicating complementary—but not perfectly correlated—assessments.
Combined in vitro/in silico method for optimization of binders.
Start from a wild-type scFv (heavy chain), build a random-mutant library, FACS-sort on multiple antigens, deep-sequence bins + input, and use per-sequence enrichment (bin/library) as the supervised target for (antibody, antigen) training pairs.
Train uncertainty-aware regressors (xGPR or ByteNet-SNGP) on those enrichment targets; run in-silico directed evolution (ISDE) from the WT, proposing single mutations and auto-rejecting moves with high predictive uncertainty while optimizing the worst-case score across antigens.
Binding is protected by the multi-antigen objective + uncertainty gating during ISDE; risky proposals are discarded before they enter the candidate set.
Filter candidates for humanness with SAM/AntPack and for solubility with CamSol v2.2 (framework is extensible to add other gates); final wet-lab set kept 29 designs after applying these filters and uncertainty checks.
Beyond large in-silico tests, yeast-display across 10 SARS-CoV-2 RBDs shows most designs outperform WT; a representative clone (Delta-63) improves KD on 8/10 variants and competes with ACE2.
Novel model for loop retrieval using embedded structural representation.
It is a multimodal tokenizer at the antibody loop (CDR) level that fuses sequence with backbone dihedral-angle features and learns a latent space with a dihedral-distance contrastive loss—unlike residue-tokenizers and canonical clusters. It produces both continuous and quantized loop tokens that can plug into PLMs (IGLOOLM / IGLOOALM).
Trained by self-supervised on ~807k loops from experimental (SAbDab/STCRDab) and Ibex-predicted structures, with four objectives: masked dihedral reconstruction, masked AA prediction, contrastive learning over dihedral distance (with DTW alignment), and codebook learning; followed by two-phase training and specific H100 settings.
It was benchmarked on a set of computational goals: for H3 loops IGLOO beats the best prior tokenizer by +5.9% (dihedral-distance criterion). (2) Cluster recovery: high purity vs. canonical clusters across CDRs. (3) Downstream PLM task: IGLOOLM improves binding-affinity prediction on 8/10 AbBiBench targets, rivaling larger models. (4) Controllable sampling: IGLOOALM generates diverse sequences with more structure consistency than inverse-folding baselines.
Created two reference libraries: a positive set from human proteins and antibodies (OAS + proteome) and a negative set from murine antibody sequences (OAS).
Antibody sequences are fragmented into 8–12-mer peptides.
Peptide fragments are scored: +1.0 if matching the positive reference, −0.2 if matching the negative reference.
Validated on 217 therapeutic antibodies with known clinical ADA incidence, showing strong negative correlation between hit rate and ADA.
On 25 humanized antibody pairs, ImmunoSeq correctly predicted reduced immunogenicity after humanization, consistent with experimental results.
Benchmarking of docking/complex prediction methods for antibody-antigen (Ab-Ag) complexes.
Authors used 200 antibody-antigen and nanobody-antigen complexes curated from prior studies, specifically chosen to exclude any complexes present in the training data of the evaluated models.
Evaluated methods: AF2 (v2.3.2), Protenix, ESMFold, Chai-1, Boltz-1, Boltz-1x, and Boltz-2. (Note: Boltz-2 was only tested on 18 complexes; Protenix failed on 26 large complexes.)
DockQ and CAPRI criteria were used as primary metrics to assess structural prediction quality.
AF2 performed best overall, especially for antibody-antigen complexes. Chai-1 outperformed AF2 on nanobody-antigen complexes.
A composite confidence metric, AntiConf, was introduced, combining pTM and pDockQ2 scores to better assess the quality of Ab-Ag models. AntiConf = 0.3 × pDockQ2 + 0.7 × pTM
Demonstration showing how large language models (LLMs) can be adapted to reduce the computational cost of molecular dynamics (MD).
They use the FoldToken encoding to discretize protein 3D conformations into tokens compatible with Mistral, and fine-tune the LLM on short MD trajectories of a single state. The model is then able to generate new sequences of conformations by predicting the next frame from previous frames.
After fine-tuning, the model can extend trajectories beyond the training data. Starting from a native state, it can discover alternative conformations potential for bypassing kinetic barriers that normally require long MD runs.
The approach is system-specific (requires an MD trajectory for each protein), does not yet encode thermodynamics/kinetics explicitly, and relies on the choice of structural tokenization.
Introduces AbSet, a curated dataset of >800,000 antibody structures, combining experimental PDB entries with in silico–generated antibody–antigen complexes.
Adds value beyond SAbDab by standardizing structures, including decoy poses, and providing residue-level molecular descriptors for machine learning.
Presents dataset profiling and validation, with analyses of structural resolution, antigen diversity, docking quality classification, and descriptor calculation efficiency.