Novel model to predict the heavy/light chain compatibility
Data: H/L with the same single-cell barcode; negatives = swap L chains between pairs but only if CDRL3 length matches; balanced set of 233,880 pairs with a 90/10 train–test split.
Training: Full VH+VL into AntiBERTa2 with a classification head; fine-tuned 3 epochs, lr 2×10⁻⁵, weight decay 0.01; κ/λ-specific variants trained identically. Final AUC-ROC 0.75 (withheld) and 0.66 (external); κ/λ models: 0.885/0.831.
Baselines: (i) V/J gene-usage → logistic reg. & XGBoost ≈ 0.50–0.52 acc.; (ii) CDRH3+CDRL3 CNNs → moderate; (iii) ESM-2 improves with fine-tuning but AntiBERTa2 FT is best.
It seems to do better than just ‘matching to the database’. Weak gene-usage baselines, explicit control of CDRL3 length in negatives, external generalisation, and sensitivity to interface residues (CDRH1/2 & framework) in therapeutic-antibody tests argue the model learns sequence-level pairing rules, not just V/L distributions.
Introduces TNP, a nanobody-specific developability profiler inspired by TAP.
Uses six metrics: total CDR length, CDR3 length, CDR3 compactness, and patch scores for hydrophobicity, positive charge, and negative charge.
Thresholds are calibrated to 36 clinical-stage nanobodies.
In vitro assays on 108 nanobodies (36 clinical-stage + 72 proprietary) show partial agreement with TNP flags, indicating complementary—but not perfectly correlated—assessments.
Combined in vitro/in silico method for optimization of binders.
Start from a wild-type scFv (heavy chain), build a random-mutant library, FACS-sort on multiple antigens, deep-sequence bins + input, and use per-sequence enrichment (bin/library) as the supervised target for (antibody, antigen) training pairs.
Train uncertainty-aware regressors (xGPR or ByteNet-SNGP) on those enrichment targets; run in-silico directed evolution (ISDE) from the WT, proposing single mutations and auto-rejecting moves with high predictive uncertainty while optimizing the worst-case score across antigens.
Binding is protected by the multi-antigen objective + uncertainty gating during ISDE; risky proposals are discarded before they enter the candidate set.
Filter candidates for humanness with SAM/AntPack and for solubility with CamSol v2.2 (framework is extensible to add other gates); final wet-lab set kept 29 designs after applying these filters and uncertainty checks.
Beyond large in-silico tests, yeast-display across 10 SARS-CoV-2 RBDs shows most designs outperform WT; a representative clone (Delta-63) improves KD on 8/10 variants and competes with ACE2.
Novel model for loop retrieval using embedded structural representation.
It is a multimodal tokenizer at the antibody loop (CDR) level that fuses sequence with backbone dihedral-angle features and learns a latent space with a dihedral-distance contrastive loss—unlike residue-tokenizers and canonical clusters. It produces both continuous and quantized loop tokens that can plug into PLMs (IGLOOLM / IGLOOALM).
Trained by self-supervised on ~807k loops from experimental (SAbDab/STCRDab) and Ibex-predicted structures, with four objectives: masked dihedral reconstruction, masked AA prediction, contrastive learning over dihedral distance (with DTW alignment), and codebook learning; followed by two-phase training and specific H100 settings.
It was benchmarked on a set of computational goals: for H3 loops IGLOO beats the best prior tokenizer by +5.9% (dihedral-distance criterion). (2) Cluster recovery: high purity vs. canonical clusters across CDRs. (3) Downstream PLM task: IGLOOLM improves binding-affinity prediction on 8/10 AbBiBench targets, rivaling larger models. (4) Controllable sampling: IGLOOALM generates diverse sequences with more structure consistency than inverse-folding baselines.
Created two reference libraries: a positive set from human proteins and antibodies (OAS + proteome) and a negative set from murine antibody sequences (OAS).
Antibody sequences are fragmented into 8–12-mer peptides.
Peptide fragments are scored: +1.0 if matching the positive reference, −0.2 if matching the negative reference.
Validated on 217 therapeutic antibodies with known clinical ADA incidence, showing strong negative correlation between hit rate and ADA.
On 25 humanized antibody pairs, ImmunoSeq correctly predicted reduced immunogenicity after humanization, consistent with experimental results.
Benchmarking of docking/complex prediction methods for antibody-antigen (Ab-Ag) complexes.
Authors used 200 antibody-antigen and nanobody-antigen complexes curated from prior studies, specifically chosen to exclude any complexes present in the training data of the evaluated models.
Evaluated methods: AF2 (v2.3.2), Protenix, ESMFold, Chai-1, Boltz-1, Boltz-1x, and Boltz-2. (Note: Boltz-2 was only tested on 18 complexes; Protenix failed on 26 large complexes.)
DockQ and CAPRI criteria were used as primary metrics to assess structural prediction quality.
AF2 performed best overall, especially for antibody-antigen complexes. Chai-1 outperformed AF2 on nanobody-antigen complexes.
A composite confidence metric, AntiConf, was introduced, combining pTM and pDockQ2 scores to better assess the quality of Ab-Ag models. AntiConf = 0.3 × pDockQ2 + 0.7 × pTM
Demonstration showing how large language models (LLMs) can be adapted to reduce the computational cost of molecular dynamics (MD).
They use the FoldToken encoding to discretize protein 3D conformations into tokens compatible with Mistral, and fine-tune the LLM on short MD trajectories of a single state. The model is then able to generate new sequences of conformations by predicting the next frame from previous frames.
After fine-tuning, the model can extend trajectories beyond the training data. Starting from a native state, it can discover alternative conformations potential for bypassing kinetic barriers that normally require long MD runs.
The approach is system-specific (requires an MD trajectory for each protein), does not yet encode thermodynamics/kinetics explicitly, and relies on the choice of structural tokenization.
Introduces AbSet, a curated dataset of >800,000 antibody structures, combining experimental PDB entries with in silico–generated antibody–antigen complexes.
Adds value beyond SAbDab by standardizing structures, including decoy poses, and providing residue-level molecular descriptors for machine learning.
Presents dataset profiling and validation, with analyses of structural resolution, antigen diversity, docking quality classification, and descriptor calculation efficiency.
Introduces a novel diffusion-based inverse folding method (RL-DIF) that improves the foldable diversity of generated sequences—i.e., it can generate more diverse sequences that still fold into the desired structure.
The model uses categorical denoising diffusion for sequence generation, followed by reinforcement learning (DDPO) to improve structural consistency with the target fold.
During reinforcement learning, ESMFold is used to predict the 3D structure of generated sequences, which is then compared (via TM-score) to the structure predicted from the native sequence to ensure they fold similarly.
Compared to baselines like PiFold and ProteinMPNN, RL-DIF achieves similar sequence recovery and structural consistency but significantly better foldable diversity—a critical advantage in protein design.
Novel protein language model with applications to epitope prediction and ranking hits in campaigns.
NextGenPLM introduces a modular, multimodal transformer that fuses frozen pretrained protein language models with structural information via spectral contact-map embeddings, enabling efficient modeling of multi-chain antibody–antigen complexes without requiring full 3D folding of antibodies.
The model was benchmarked on 112 diverse antibody–antigen complexes against state-of-the-art structure predictors (Chai-1 and Boltz-1x), matching their contact-map and epitope prediction accuracy while achieving ~100× higher throughput (4 complexes/sec vs. ~1 min/complex).
The model was experimentally validated through an internal affinity-maturation campaign. Using its predictions to rank antibody variants led to designs that achieved up to 17× binding affinity improvements over the wild-type, as confirmed by surface plasmon resonance (SPR) assays.