Measuring the effects of protein energetics versus actual protein-protein binding.
The authors used AlphaSeq to measure 7,185 single and double mutations across four VHH-antigen complexes to capture changes in observed affinity.
By using "control VHHs" that bind to non-overlapping epitopes, they successfully separated "protein-quality" (folding/stability) effects from true "protein-interaction" (interface) changes.
The study found that 83.6% - 98.9% of antigen mutations negatively impact affinity, primarily by degrading the protein's overall quality rather than disrupting specific interface energetics.
Benchmarking showed that sota models like ESM-IF1 and ThermoMPNN are effective at predicting protein-quality changes but struggle to accurately predict specific protein-interaction effects.
Authors benchmark co-folding methods on their ability to identify true positives given deep sampling.
AlphaFold3 consistently outperforms AlphaFold2, Chai-1, and Boltz-1 in predicting antibody-antigen complexes, though its accuracy declines if the target lacks structural similarity to its training data.
For all methods, increased sampling improves the probability of generating a correct model in a roughly log-linear manner; however, the improvement is limited by a significant gap between the "best" model generated and the "top-ranked" model.
Internal confidence metrics (like ipTM) struggle to identify the most accurate structures for a given target, primarily because the models cannot yet accurately predict their own aligned errors.
Case study of training of an affinity prediction algorithm on anti-sars-cov-2 antibodies.
Authors fine-tuned a BERT-based model, Ab-Affinity, specifically to predict the binding affinity of antibodies against the SARS-CoV-2 spike protein.
They utilized a dataset of 71,834 unique antibodies (preprocessed from 104,972 variants) derived from three parental "seed" binders with experimentally measured affinities.
The model employs a BERT-based encoder (specifically ESM-2) with an added fully connected regression layer to predict continuous binding scores.
Ab-Affinity achieved higher Pearson and Spearman correlation coefficients on the test set than existing LLM-based methods like DG-Affinity, ESM-2, and AbLang. But the baselines were not fine tuned on their data.
Evedesign, an open-source, method-agnostic framework that standardizes biosequence design by enabling different machine learning models (sequence, structure, and evolutionary) to work together in a single workflow.
It works by framing design as a conditional modeling problem using three composable operations: Generate (creating new sequences), Score (predicting fitness or likelihood), and Transform (mapping between representations like sequence-to-structure).
The authors did not perform new wet-lab experiments; instead, they tested the framework by computationally reproducing previous studies, showing ESM-2 and ProteinMPNN could successfully rank and prioritize known beneficial mutations from existing antibody datasets.
Authors advocate for a "prompt-to-drug" autonomous pipeline, using a central AI orchestrator to connect disparate pre-clinical and clinical steps agentically.
While modular proofs-of-concept exist, they remain domain-specific, brittle, and far from full-cycle implementation in actual drug discovery programs.
A primary recommendation is to eliminate "data silos" by making research open, peer-reviewed, and accessible via APIs to ensure outputs are easily "machine-readable" for AI training.
The system faces significant hurdles from LLM hallucinations and "cascading errors," where a single early-stage miscalculation (like an incorrect binding pocket) propagates through the entire chain.
Despite the push for autonomy, authors argue "human-in-the-loop" checkpoints remain legally and ethically mandatory for high-stakes regulatory and clinical transitions.
Analysis of developability data from 33 internal Biogen programs, covering 18,540 antibodies.
Focused on three dimensions: hydrophobicity (HIC), polyspecificity (PSR), and self-association (AC-SINS).
Labeled subsets included 4,594 (PSR), 1,792 (HIC), and 7,727 (AC-SINS) sequences.
Benchmarked three PLMs: ESM2 (general-purpose), plus IgBert and IgT5 (antibody-specific).
Domain-adaptive fine-tuning consistently boosted antibody-specific PLMs, but often degraded ESM2 performance.
Antibody-specific PLMs generally provided better embeddings for PSR and AC-SINS, while ESM2 remained highly competitive for HIC.
Perplexity was only weakly correlated in aggregate, but showed significant association with PSR/AC-SINS failure when controlled for a fixed light chain
Novel experimental and computational pipeline designed to characterize nanobody immune repertoires following immunization and phage display selection - NanoMAP.
It introduces a flexible clustering method that identifies clonal families by grouping sequences with similar V/J segments and CDR lengths, then applying a unique merging step that allows for minor CDR variations.
When benchmarked against MMseqs2 and Immcantation (SCOPer), NanoMAP scored higher on computational metrics (Silhouette, phenotypic quality, and stability) and showed better alignment with expert-curated "ground truth" labels.
Novel generative framework to design protein binders from NVIDIA.
Antibodies/nanobodie are not singled out for analysis.
First framework to unify generative modeling with hallucination-based optimization, allowing for a strong generative prior to be steered by inference-time compute.
The authors introduced Teddymer, a dataset of ~510,000 synthetic dimers created from AlphaFold predicted domain-domain interactions to overcome the scarcity of experimental multimer data.
The model uses advanced search algorithms, including Beam Search, Feynman-Kac Steering, and MCTS, to navigate the generative space and find high-quality binders.
It achieved state-of-the-art results on protein targets, small molecules, and enzyme design tasks, consistently outperforming baselines like RFDiffusion and BindCraft.
AnewOmni, foundation model that unifies the design of small molecules, peptides, and antibodies into a single framework.
The team evaluated approximately 3,000 candidates for the "undruggable" KRAS G12D target by alternating between AnewOmni for CDR design and AlphaFold3 for structural validation.
Out of 7 synthesized nanobodies, the model achieved a 75% success rate (3 out of 4) when using a conservative structural consistency filter.
The most successful nanobody design demonstrated a high binding affinity with a Kd of 587 nM