Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Results of the Gingko developability competition.
    • Benchmarked 113 teams on predicting five key developability traits: hydrophobicity, thermostability, self-association, expression titer, and polyreactivity.
    • Models were trained on the GDPa1 dataset (246 antibodies) and blindly tested on GDPa3 (80 diverse antibodies from OAS).
    • While cross-validation (CV) results were promising, performance plummeted on the test set e.g., self-association dropped from a 0.653 CV Spearman's rho to 0.356.
    • Hydrophobicity was the most predictable (rho = 0.708), while expression titer was the most challenging (rho = 0.310).
    • Winning models varied by assay; for example, team AbDevelop won for self-interaction, while microcrisprtm2 led in thermostability.
    • A strategy for layer-wise selective fine-tuning of general protein language models.
    • Instead of full fine-tuning, they found that adapting only the first 50-75% of layers via LoRA provides optimal performance while saving computational costs.
    • For example, they perform sequence-specific "test-time" training where they optimize the model using a Masked Language Modeling (MLM) objective on the target sequence itself before predicting its properties. This approach led to a 18.4% accuracy boost in predicting the notoriously difficult CDR-H3 antibody loop
    • New open source reproduction of AlphaFold3 that either matches or surpasses it.
    • IntelliFold-2-Pro achieves a success rate of 58.2% (DockQ > 0.23 so about 4A irmsd) on antibody-antigen interactions, outperforming AlphaFold 3's 47.9%.
    • For small molecule co-folding, IntelliFold-2-Pro reaches 67.7%, surpassing AlphaFold 3’s 64.9%.
    • Interface Precision vs. Monomers: IntelliFold-2 shows marginal gains in protein monomer accuracy (LDDT of 0.89 vs AF3's 0.88).
    • ‘Evolution’ of AlphaFold from Isomorphic labs.
    • IsoDDE achieves 39% accuracy in high fidelity regime ((DockQ > 0.8)) which corresponds to near-experimental precision with an interface RMSD (iRMSD) typically below 1.0Å. That’s a 2.3x improvement over AF3.
    • Using a single model seed, IsoDDE successfully predicts 63% of interfaces DockQ > 0.23 (correlating to an iRMSD$ of roughly 4.0Å or less, which is a 1.4x improvement over AF3's single-seed performance.
    • IsoDDE accurately models the backbone of the highly variable CDR-H3 loop for 70% of antibodies (<2Å) in the test set, outperforming AF3’s success rate of 58% 1.2x.
    • When scaled to 1,000 seeds, IsoDDE reaches an 82% success rate for correct interfaces and 59% for high-accuracy predictions. So to get results one cannot exactly do it on a laptop.
    • It is a technical report, architecture is not discussed.
    • Method addressing binding prediction strength training on low data noisy dataset.
    • The researchers address the issue that the field's standard benchmark, SKEMPI2, has significant hidden data leakage where different protein complexes share over 99% sequence identity, leading to inflated performance estimates in models that simply memorize these patterns. Problem raised by many, addressed by hardly any.
    • ProtBFF injects five interpretable physical priors, Interface, Burial, Dihedral, SASA, and lDDT, directly into residue embeddings using cross-embedding attention to prioritize the most structurally relevant parts of a protein.
    • By evaluating models on stricter, homology-based sequence clusters (60% similarity), the authors proved that ProtBFF allows general-purpose models like ESM to match or outperform specialized state-of-the-art predictors, even in data-limited "few-shot" scenarios.
  • 2026-02-05

    Multiple protein structure alignment at scale with FoldMason

    • non-antibody stuff
    • structure prediction
    • Protocol for ultra fast protein structure alignment.
    • FoldMason represents protein structures as 1D sequences using a structural alphabet (3Di+AA), which allows it to perform multiple alignments using fast string comparison algorithms and a parallelized progressive alignment following a minimum spanning tree.
    • It operates two to three orders of magnitude faster than traditional structure-based methods, achieving a 722x speedup over tools like MUSTANG and scaling to align 10,000 structures in a fraction of the time required by competitors for just 100.
    • It matches the accuracy of gold-standard structure aligners and exceeds sequence-based tools, particularly in aligning distantly related proteins or flexible structures that global superposition-based methods struggle to handle.
    • It is used for large-scale structural analysis of massive databases like AlphaFoldDB, building structure-based phylogenies for proteins that have diverged past the "twilight zone" of sequence similarity, and providing interactive web-based visualizations of complex MSTAs
  • 2026-02-05

    Protenix V1

    • structure prediction
    • First fully open-source reproduction of the diffusion-based AlphaFold3 architecture that matches or exceeds its performance while strictly adhering to the same training data cutoff and model scale (especially on antibodies!).
    • Unlike previous open-source models, it exhibits a consistent improvement in accuracy as more computational budget is allocated (you sample more).
    • Protenix-v1 beats others in antibody-antigen interface prediction, outperforming AlphaFold3 52.31% vs. 48.75% success rate (dockq better than .23). That is nearly doubling the accuracy of open-source like Chai-1 23.12%.
    • Describing a protocol to design mini-binders for a multi domain not that well characterized target using Latent-X1 and to lesser extent Chai.
    • The protocol used Latent-X1 to generate de novo sequences and initial poses, which were then refolded using Chai-1 to ensure the designs were structurally consistent and plausible.
    • The final rank was determined by the equation score = 2.0 * Binder PTM - 0.1 * min-iPAE - 0.1 * complex RMSD. This formula prioritized high global confidence (PTM) while penalizing designs where the Latent-X1 pose and Chai-1 refolded structure disagreed (iPAE and RMSD).
    • To handle the complex, multidomain IgE interface, they first designed binders against a smaller, stable seed on the epsilon3 domain before iteratively expanding the interface toward the full receptor-binding site.
    • Out of hundreds of generated designs, fewer than 80 candidates across two rounds were selected for wet-lab testing, resulting in a 6% hit rate and the identification of three specific IgE-binding miniproteins
    • De novo platform for epitope-specific antibody design against “zero-prior” targets, i.e. antigen sites with no known antibody–antigen or protein–protein complex structures and limited homology to previously solved interfaces.
    • The method combines three tightly integrated components: AbsciDiff, an all-atom diffusion model fine-tuned from Boltz-1 to generate epitope-conditioned antibody–antigen complex structures; IgDesign2, a structure-conditioned paired heavy–light CDR sequence design model; and AbsciBind, a modified AF-Unmasked / AlphaFold-Multimer–based scoring protocol using ipTM-derived interface confidence to rank and filter designs.
    • The platform was evaluated on 10 zero-prior protein targets, with fewer than 100 antibody designs per target advanced to experimental testing; specific binders were successfully identified for 4 targets (COL6A3, AZGP1, CHI3L2, IL36RA).
    • Experimental validation demonstrated both structural and functional accuracy, including cryo-EM confirmation at near-atomic resolution (DockQ 0.73–0.83) for two targets and AI-guided affinity maturation yielding a functional IL36RA antagonist with ~100 nM potency.
    • Novel framework that identifies high-affinity leads using data from only a single round of FACS, significantly reducing the labor and reagents required for traditional multi-round affinity maturation campaigns.
    • Models were trained using log enrichment ratios (continuous) or binary labels (enriched vs. depleted), calculated by normalizing post-sorting FACS abundance against pre-sorting MACS abundance to account for expression biases.
    • They benchmarked linear/logistic regression and CNNs against a semi-supervised ESM2-MLP approach ; notably, the linear models often outperformed deeper architectures in ranking validated substitutions and offered superior interpretability for identifying confounding signals like polyreactivity.
    • By generalizing information across all sequences, ML models effectively separated "affinity-driving" mutations from "passenger" substitutions, identifying sub-nanomolar binders that were not prioritized by traditional, more laborious raw sequencing count analysis.
    • The best-performing models were leveraged within a Gibbs sampling protocol to design novel sequences unseen in the original experiment, ultimately yielding multiple improved binders with up to a ~2500-fold affinity increase over the wild-type.