Introduces IBEX, a pan‑immunoglobulin structure predictor for antibodies, nanobodies, and TCRs that explicitly models both bound (holo) and unbound (apo) conformations via a conformation token.
Training data comprise ~14 000 high‑quality antibody (SAbDab) and TCR (STCRDab) structures (including 760 matched apo/holo pairs), augmented by distillation from ~60 000 predicted immunoglobulin‑like structures to improve generalization (from OAS, modeled with ESMFOld and Boltz-1).
Architecture builds on AlphaFold2’s invariant‑point‑attention and the ABodyBuilder2 framework, adding a residual connection from the initial embedding into every structure module and feeding an apo/holo token at each block.
Performance on a private benchmark of 286 novel antibodies shows IBEX achieves mean CDR‑H3 RMSD = 2.28 Å, outperforming Chai‑1 (2.55 Å), Boltz‑1 (2.30 Å), and Boltz‑2 (2.42 Å). Most of its advantage arises from greater robustness to sequences whose CDR‑H3 loops have larger edit distances to any structure in the training set.
Introduced GAMA, an attribution approach for autoregressive LSTM generative models that pinpoints which sequence positions drive binding in a one-antigen–many-antibody setting
Benchmarked on 270 synthetic motif-implant datasets and simulated binder sequences from the Absolut! framework across multiple antigens, then applied to an experimental set of 8,955 Trastuzumab CDRH3 variants binding HER2
On the Trastuzumab-HER2 dataset, GAMA flags CDRH3 positions 103, 104, 105, and 107 as most critical—overlapping three of the four crystallographically determined paratope residues
Novel inverse folding algorithm based on a discrete diffusion framework.
Unlike earlier methods that focused on masked language modeling (MLM) (e.g., LM-Design) or autoregressive sequence generation (e.g., ProteinMPNN), this work introduces a discrete denoising diffusion model (MapDiff) to iteratively refine protein sequences toward the native sequence. The method incorporates an IPA-based refinement step that selectively re-predicts low-confidence residues.
Structural input is limited to the protein backbone only, represented as residue-level graphs. All-atom information is not used for either masked or unmasked residues.
On the CATH 4.2 full test set, their method achieves the best sequence recovery rate of 61.03%, outperforming baselines such as: ProteinMPNN: 48.63% PiFold: 51.40% LM-Design: 53.19% GRADE-IF: 52.63%
MapDiff also achieves the lowest perplexity (3.46) across models.
A novel antibody-specific language model, trained on paired human antibody data, and explicitly designed for practical antibody engineering applications.
The model was trained on a carefully curated dataset of productive, paired sequences, prioritizing biological fidelity over sheer volume or data heterogeneity.
It uses a masked language modelling (MLM) objective. The initial version was based on RoBERTa, while later versions introduced custom architectural modifications tailored to antibody sequences.
The model was benchmarked on recapitulating clinical humanization decisions and outperformed prior models such as Sapiens and AntiBERTa.
It was applied to redesign an existing therapeutic antibody, generating variants with retained or improved affinity, reduced predicted liabilities, and confirmed in vitro performance, including CHO expression and binding assays.
Novel library design technique for VHHs that produces developable and humanized antibodies without the need for further optimization.
The authors built a humanized VHH phage display library using four therapeutic VHH scaffolds, incorporating CDR1 and CDR2 sequences from human VH3 germline genes (filtered for sequence liabilities) and highly diverse CDR3s from CD19⁺ IgM⁺ human B cells.
CDR1 and CDR2 libraries were filtered via yeast display for proper folding and protein A binding, while CDR3s were refined to remove poly-tyrosine stretches to reduce polyreactivity.
An improved library version incorporated CDR1/2 variants selected for heat tolerance and further depleted CDR3s with poly-tyrosine motifs, increasing stability and developability.
VHHs were tested for expression, thermal stability, aggregation, hydrophobicity, and polyreactivity, showing that the V2 library yielded a higher proportion of drug-like antibodies with favorable biophysical properties.
Introduced a novel model, Chai-2, that shows over 100× improvement in de novo antibody design success rates compared to prior methods.
The model is prompted with the structure of the target, epitope residues, and desired antibody format (e.g., scFv or VHH).
Benchmarking was performed on 52 antigens that had no known antibodies in the PDB, ensuring evaluation on novel, unbiased targets.
Generated antibodies were structurally and sequentially dissimilar to any known antibodies, indicating that Chai-2 designs novel binders, not memorized ones.
For VHH (nanobody) formats, the model achieved an experimental hit rate of 20%, validated in a single experimental round.
Novel method to design antibodies based on boltz-1.
They added a sequence head to boltz-1 to perform simultaneous sequence/structure co-design.
They employed data from SAbDab to fine-tune boltz-1 on antibody-antigen complexes.
They compared to dyMEAN and DiffAB looking at amino acid recovery, RMSD and Rosetta InterfaceAnalyzer energy - their model does better on these computational benchmarks.
A novel method that repurposes AlphaFold-2.3 structure predictions and combines them with inverse folding–based machine learning models to assess antibody-antigen binding accuracy and specificity.
They generate antibody-antigen complex models using AlphaFold-2.3 and evaluate them using the 'AbAgIoU' metric, which measures the overlap between predicted and true epitope/paratope residues — penalizing both missing and extra contacts.
They demonstrate that the learned scores can distinguish true from incorrect antibody-antigen pairings (including swapped antibody scenarios), significantly outperforming random baselines.
The method relies only on antibody and antigen sequences as input, using AlphaFold to model structures — making it applicable in real-world settings where experimental structures are unavailable.
Introduced a novel machine learning method (NanoBinder) to predict the binding probability of nanobody-antigen structural complexes.
Positive (binding) complexes were sourced from the SAbDab database, which contains experimentally validated nanobody-antigen interactions.
Negative (non-binding) complexes were generated by structurally aligning nanobodies from different binding complexes (with RMSD < 2 Å) and recombining them with unrelated antigens to create likely non-binding pairs.
Extracted Rosetta energy features from each complex and trained several machine learning models, including Random Forests, SVMs, AdaBoost, and Decision Trees, to classify binders vs. non-binders. Random Forests showed the best performance.
They selected antibodies with known antigen targets (e.g., IL-6) and grafted their CDRs onto nanobody scaffolds using Rosetta-based protocols. The resulting nanobody-antigen complexes were evaluated in silico using NanoBinder, and selected candidates were experimentally validated. The predictions showed good correlation with binding outcomes, particularly for identifying non-binders.