Training of a baseline developability predictor on the Gingko dataset.
Utilized the GDPal benchmark from Ginkgo Bioworks, consisting of 242 therapeutic IgGs across five assays: HIC, AC-SINS, PR_CHO, Titer, and Tm2.
Employed frozen ESM-Cambrian encoders (up to 6B parameters) to generate embeddings, which were processed by property-specific attention decoders (Self, Self+Cross, or Bidirectional Cross) and a prediction head.
Achieved significant improvements over baselines in 3/5 properties: expression titer (+20%), thermal stability (+18%), and polyreactivity (+12%).
Optimal attention schemes differ by property; self-attention alone suffices for aggregation-related traits (HIC, PR_CHO), while bidirectional cross-attention is required for properties involving inter-chain compatibility (Titer, Tm2).
Evedesign, an open-source, method-agnostic framework that standardizes biosequence design by enabling different machine learning models (sequence, structure, and evolutionary) to work together in a single workflow.
It works by framing design as a conditional modeling problem using three composable operations: Generate (creating new sequences), Score (predicting fitness or likelihood), and Transform (mapping between representations like sequence-to-structure).
The authors did not perform new wet-lab experiments; instead, they tested the framework by computationally reproducing previous studies, showing ESM-2 and ProteinMPNN could successfully rank and prioritize known beneficial mutations from existing antibody datasets.
Case study of training of an affinity prediction algorithm on anti-sars-cov-2 antibodies.
Authors fine-tuned a BERT-based model, Ab-Affinity, specifically to predict the binding affinity of antibodies against the SARS-CoV-2 spike protein.
They utilized a dataset of 71,834 unique antibodies (preprocessed from 104,972 variants) derived from three parental "seed" binders with experimentally measured affinities.
The model employs a BERT-based encoder (specifically ESM-2) with an added fully connected regression layer to predict continuous binding scores.
Ab-Affinity achieved higher Pearson and Spearman correlation coefficients on the test set than existing LLM-based methods like DG-Affinity, ESM-2, and AbLang. But the baselines were not fine tuned on their data.
Measuring the effects of protein energetics versus actual protein-protein binding.
The authors used AlphaSeq to measure 7,185 single and double mutations across four VHH-antigen complexes to capture changes in observed affinity.
By using "control VHHs" that bind to non-overlapping epitopes, they successfully separated "protein-quality" (folding/stability) effects from true "protein-interaction" (interface) changes.
The study found that 83.6% - 98.9% of antigen mutations negatively impact affinity, primarily by degrading the protein's overall quality rather than disrupting specific interface energetics.
Benchmarking showed that sota models like ESM-IF1 and ThermoMPNN are effective at predicting protein-quality changes but struggle to accurately predict specific protein-interaction effects.
Authors benchmark co-folding methods on their ability to identify true positives given deep sampling.
AlphaFold3 consistently outperforms AlphaFold2, Chai-1, and Boltz-1 in predicting antibody-antigen complexes, though its accuracy declines if the target lacks structural similarity to its training data.
For all methods, increased sampling improves the probability of generating a correct model in a roughly log-linear manner; however, the improvement is limited by a significant gap between the "best" model generated and the "top-ranked" model.
Internal confidence metrics (like ipTM) struggle to identify the most accurate structures for a given target, primarily because the models cannot yet accurately predict their own aligned errors.
Analysis of developability data from 33 internal Biogen programs, covering 18,540 antibodies.
Focused on three dimensions: hydrophobicity (HIC), polyspecificity (PSR), and self-association (AC-SINS).
Labeled subsets included 4,594 (PSR), 1,792 (HIC), and 7,727 (AC-SINS) sequences.
Benchmarked three PLMs: ESM2 (general-purpose), plus IgBert and IgT5 (antibody-specific).
Domain-adaptive fine-tuning consistently boosted antibody-specific PLMs, but often degraded ESM2 performance.
Antibody-specific PLMs generally provided better embeddings for PSR and AC-SINS, while ESM2 remained highly competitive for HIC.
Perplexity was only weakly correlated in aggregate, but showed significant association with PSR/AC-SINS failure when controlled for a fixed light chain
Authors advocate for a "prompt-to-drug" autonomous pipeline, using a central AI orchestrator to connect disparate pre-clinical and clinical steps agentically.
While modular proofs-of-concept exist, they remain domain-specific, brittle, and far from full-cycle implementation in actual drug discovery programs.
A primary recommendation is to eliminate "data silos" by making research open, peer-reviewed, and accessible via APIs to ensure outputs are easily "machine-readable" for AI training.
The system faces significant hurdles from LLM hallucinations and "cascading errors," where a single early-stage miscalculation (like an incorrect binding pocket) propagates through the entire chain.
Despite the push for autonomy, authors argue "human-in-the-loop" checkpoints remain legally and ethically mandatory for high-stakes regulatory and clinical transitions.
AnewOmni, foundation model that unifies the design of small molecules, peptides, and antibodies into a single framework.
The team evaluated approximately 3,000 candidates for the "undruggable" KRAS G12D target by alternating between AnewOmni for CDR design and AlphaFold3 for structural validation.
Out of 7 synthesized nanobodies, the model achieved a 75% success rate (3 out of 4) when using a conservative structural consistency filter.
The most successful nanobody design demonstrated a high binding affinity with a Kd of 587 nM
Novel generative framework to design protein binders from NVIDIA.
Antibodies/nanobodie are not singled out for analysis.
First framework to unify generative modeling with hallucination-based optimization, allowing for a strong generative prior to be steered by inference-time compute.
The authors introduced Teddymer, a dataset of ~510,000 synthetic dimers created from AlphaFold predicted domain-domain interactions to overcome the scarcity of experimental multimer data.
The model uses advanced search algorithms, including Beam Search, Feynman-Kac Steering, and MCTS, to navigate the generative space and find high-quality binders.
It achieved state-of-the-art results on protein targets, small molecules, and enzyme design tasks, consistently outperforming baselines like RFDiffusion and BindCraft.