Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • ‘Evolution’ of AlphaFold from Isomorphic labs.
    • IsoDDE achieves 39% accuracy in high fidelity regime ((DockQ > 0.8)) which corresponds to near-experimental precision with an interface RMSD (iRMSD) typically below 1.0Å. That’s a 2.3x improvement over AF3.
    • Using a single model seed, IsoDDE successfully predicts 63% of interfaces DockQ > 0.23 (correlating to an iRMSD$ of roughly 4.0Å or less, which is a 1.4x improvement over AF3's single-seed performance.
    • IsoDDE accurately models the backbone of the highly variable CDR-H3 loop for 70% of antibodies (<2Å) in the test set, outperforming AF3’s success rate of 58% 1.2x.
    • When scaled to 1,000 seeds, IsoDDE reaches an 82% success rate for correct interfaces and 59% for high-accuracy predictions. So to get results one cannot exactly do it on a laptop.
    • It is a technical report, architecture is not discussed.
    • New open source reproduction of AlphaFold3 that either matches or surpasses it.
    • IntelliFold-2-Pro achieves a success rate of 58.2% (DockQ > 0.23 so about 4A irmsd) on antibody-antigen interactions, outperforming AlphaFold 3's 47.9%.
    • For small molecule co-folding, IntelliFold-2-Pro reaches 67.7%, surpassing AlphaFold 3’s 64.9%.
    • Interface Precision vs. Monomers: IntelliFold-2 shows marginal gains in protein monomer accuracy (LDDT of 0.89 vs AF3's 0.88).
    • A strategy for layer-wise selective fine-tuning of general protein language models.
    • Instead of full fine-tuning, they found that adapting only the first 50-75% of layers via LoRA provides optimal performance while saving computational costs.
    • For example, they perform sequence-specific "test-time" training where they optimize the model using a Masked Language Modeling (MLM) objective on the target sequence itself before predicting its properties. This approach led to a 18.4% accuracy boost in predicting the notoriously difficult CDR-H3 antibody loop
  • 2026-02-05

    Multiple protein structure alignment at scale with FoldMason

    • non-antibody stuff
    • structure prediction
    • Protocol for ultra fast protein structure alignment.
    • FoldMason represents protein structures as 1D sequences using a structural alphabet (3Di+AA), which allows it to perform multiple alignments using fast string comparison algorithms and a parallelized progressive alignment following a minimum spanning tree.
    • It operates two to three orders of magnitude faster than traditional structure-based methods, achieving a 722x speedup over tools like MUSTANG and scaling to align 10,000 structures in a fraction of the time required by competitors for just 100.
    • It matches the accuracy of gold-standard structure aligners and exceeds sequence-based tools, particularly in aligning distantly related proteins or flexible structures that global superposition-based methods struggle to handle.
    • It is used for large-scale structural analysis of massive databases like AlphaFoldDB, building structure-based phylogenies for proteins that have diverged past the "twilight zone" of sequence similarity, and providing interactive web-based visualizations of complex MSTAs
    • Method addressing binding prediction strength training on low data noisy dataset.
    • The researchers address the issue that the field's standard benchmark, SKEMPI2, has significant hidden data leakage where different protein complexes share over 99% sequence identity, leading to inflated performance estimates in models that simply memorize these patterns. Problem raised by many, addressed by hardly any.
    • ProtBFF injects five interpretable physical priors, Interface, Burial, Dihedral, SASA, and lDDT, directly into residue embeddings using cross-embedding attention to prioritize the most structurally relevant parts of a protein.
    • By evaluating models on stricter, homology-based sequence clusters (60% similarity), the authors proved that ProtBFF allows general-purpose models like ESM to match or outperform specialized state-of-the-art predictors, even in data-limited "few-shot" scenarios.
    • Describing a protocol to design mini-binders for a multi domain not that well characterized target using Latent-X1 and to lesser extent Chai.
    • The protocol used Latent-X1 to generate de novo sequences and initial poses, which were then refolded using Chai-1 to ensure the designs were structurally consistent and plausible.
    • The final rank was determined by the equation score = 2.0 * Binder PTM - 0.1 * min-iPAE - 0.1 * complex RMSD. This formula prioritized high global confidence (PTM) while penalizing designs where the Latent-X1 pose and Chai-1 refolded structure disagreed (iPAE and RMSD).
    • To handle the complex, multidomain IgE interface, they first designed binders against a smaller, stable seed on the epsilon3 domain before iteratively expanding the interface toward the full receptor-binding site.
    • Out of hundreds of generated designs, fewer than 80 candidates across two rounds were selected for wet-lab testing, resulting in a 6% hit rate and the identification of three specific IgE-binding miniproteins
  • 2026-02-05

    Protenix V1

    • structure prediction
    • First fully open-source reproduction of the diffusion-based AlphaFold3 architecture that matches or exceeds its performance while strictly adhering to the same training data cutoff and model scale (especially on antibodies!).
    • Unlike previous open-source models, it exhibits a consistent improvement in accuracy as more computational budget is allocated (you sample more).
    • Protenix-v1 beats others in antibody-antigen interface prediction, outperforming AlphaFold3 52.31% vs. 48.75% success rate (dockq better than .23). That is nearly doubling the accuracy of open-source like Chai-1 23.12%.
    • Prompt-based, in-context prediction of antibody developability properties using large language models, rather than training separate predictors per property.
    • As a baseline, they evaluate TxGemma, a therapeutics-specific multimodal LLM that supports task switching via prompts and is fine-tuned using LoRA.
    • The study relies on a very large antibody dataset (~876k heavy chains) with in-silico–computed biophysical developability properties, combining sequence-based and structure-based predictors.
    • Models are trained and evaluated using prompts that include antibody sequences together with partially observed property/value pairs, asking the model to infer a missing property for a query sequence.
    • To prevent shortcut learning where the model ignores context and relies only on sequence, the authors introduce AB-context-aware training, which applies a random latent transformation jointly to context properties and targets during training, forcing explicit use of contextual information.
    • By simulating batch effects, they show that standard fine-tuned TxGemma degrades sharply as batch bias increases (from ~0.99 Spearman ρ with no bias to ~0.95 with moderate bias and ~0.58 with strong bias), whereas context-aware training remains robust even under strong batch effects.
    • De novo platform for epitope-specific antibody design against “zero-prior” targets, i.e. antigen sites with no known antibody–antigen or protein–protein complex structures and limited homology to previously solved interfaces.
    • The method combines three tightly integrated components: AbsciDiff, an all-atom diffusion model fine-tuned from Boltz-1 to generate epitope-conditioned antibody–antigen complex structures; IgDesign2, a structure-conditioned paired heavy–light CDR sequence design model; and AbsciBind, a modified AF-Unmasked / AlphaFold-Multimer–based scoring protocol using ipTM-derived interface confidence to rank and filter designs.
    • The platform was evaluated on 10 zero-prior protein targets, with fewer than 100 antibody designs per target advanced to experimental testing; specific binders were successfully identified for 4 targets (COL6A3, AZGP1, CHI3L2, IL36RA).
    • Experimental validation demonstrated both structural and functional accuracy, including cryo-EM confirmation at near-atomic resolution (DockQ 0.73–0.83) for two targets and AI-guided affinity maturation yielding a functional IL36RA antagonist with ~100 nM potency.
    • Novel framework that identifies high-affinity leads using data from only a single round of FACS, significantly reducing the labor and reagents required for traditional multi-round affinity maturation campaigns.
    • Models were trained using log enrichment ratios (continuous) or binary labels (enriched vs. depleted), calculated by normalizing post-sorting FACS abundance against pre-sorting MACS abundance to account for expression biases.
    • They benchmarked linear/logistic regression and CNNs against a semi-supervised ESM2-MLP approach ; notably, the linear models often outperformed deeper architectures in ranking validated substitutions and offered superior interpretability for identifying confounding signals like polyreactivity.
    • By generalizing information across all sequences, ML models effectively separated "affinity-driving" mutations from "passenger" substitutions, identifying sub-nanomolar binders that were not prioritized by traditional, more laborious raw sequencing count analysis.
    • The best-performing models were leveraged within a Gibbs sampling protocol to design novel sequences unseen in the original experiment, ultimately yielding multiple improved binders with up to a ~2500-fold affinity increase over the wild-type.