Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • An end-to-end structural refinement method specifically developed for antibodies and nanobodies.
    • Curates structures from SAbDab to train a pure equivariant graph transformer (not a standard EGNN) that directly predicts 3D coordinate shift vectors for all backbone atoms (N, Ca, C, O).
    • Benchmarked against five baseline methods, achieving an average Ca RMSD improvement typically on the order of 0.01 Å to 0.05 Å - which essentially is raising questions of statistical significance (as the error bars are order of 1.0A at least).
    • Protein Language model that understands protein dynamics.
    • The authors leveraged existing structural data and datasets like mdCATH to gather equilibrium fluctuations for 64,403 proteins. Instead of raw time-series trajectories, they extracted calculated biophysical properties like root-mean-square fluctuations (RMSF) and Normal Mode Analysis (NMA) to serve as training labels.
    • They trained two models, SeqDance 'from scratch' and ESMDance as an extension of ESM2. ESMDance: Built by fine-tuning the pre-trained ESM-2 transformer, teaching it to map its existing evolutionary knowledge to these new physical flexibility profiles. SeqDance: Trained completely from scratch using only raw sequences and the target dynamics data, forcing it to learn pure, unbiased residue co-movement and physics.
    • To test zero-shot mutation prediction, the models compare the wild-type flexibility against the mutated sequence's flexibility. A large mathematical discrepancy flags a highly disruptive, damaging mutation. These predicted shifts were correlated against deep mutational scanning (DMS) lab data measuring actual cellular fitness and stability changes. ESMDance is the go-to for mutation prediction (especially on viral and de novo designed proteins with no evolutionary history), while SeqDance wins at modeling highly flexible Intrinsically Disordered Regions (IDRs).
  • 2026-05-29

    Language Modeling Materializes a World Model of Protein Biology

    • language models
    • protein design
    • structure prediction
    • New versions of ESM and ESMFold, ESMC (ESM Cambrian) and ESMFold2 to model protein sequence, structure, and function.
    • Like previous versions, ESMC is trained entirely on sequences using a masked language modeling (MLM) objective. However, it scales up to 2.8 billion metagenomic sequences, nearly a 100-fold increase over the 50 million sequences used for ESM2.
    • ESMFold2 achieves state-of-the-art atomic resolution and directly outperforms AlphaFold3 on complex antibody-antigen predictions (even when AlphaFold3 is given multiple sequence alignments and ESMFold2 operates from sequence alone).
    • To design therapeutic antibody fragments (scFvs), they input the target sequence and lock in a stable, known antibody framework template, leaving only the target-recognizing loops (CDRs) to be filled in.
    • AI-guided optimization: The system uses mathematical backpropagation through both ESMC and ESMFold2 to iteratively optimize the CDR sequences. It automatically mutates the loops to maximize structural interface confidence scores.
    • They synthesized and tested these computational designs in the wet lab, achieving high experimental hit rates and discovering entirely novel binders with therapeutically relevant nanomolar affinities.
    • No-code environment from Genentech/Roche, giving access to latest tools in protein/antibody design.
    • Integrates models (BindCraft, OpenFold etc.) and datasets such as PDB, Uniprot.
    • A Unified Framework for Unsupervised AIRR Analytics
    • immuneML introduces the first standardized environment to discover patterns, cluster sequences, and run robust stability validations on partially or imperfectly labeled adaptive immune receptor data.
    • The platform systematically evaluates and compares generative machine learning models (such as LSTM and VAE) to determine how effectively they can engineer novel, antigen-specific immune sequences versus simply memorizing training data.
    • It rigorously assesses how well different data representations, including advanced protein language models, capture true biological properties like epitope specificity and MHC restrictions, a utility proven on 48,000 experimental TCRβ sequences.
    • It provides vital exploratory and dimensionality reduction tools to identify sequencing batch effects and data biases before running supervised diagnostics, demonstrated using a real-world single-cell dataset from 143 inflammatory bowel disease patients.
    • Novel protein design model, revisiting the SE(3)architecture.
    • Genie 3 is an all-atom, SE(3)-equivariant structure diffusion model that treats proteins as branched polymers to capture sidechain details, utilizing a Latent Transformer with bidirectional layer updates and an Invariant Point Attention structural decoder.
    • The authors did not test the model on therapeutic formats such as antibodies or nanobodies; instead, they focused entirely on generating generic de novo protein binders, unconditional monomers, and functional motif scaffolds.
    • The method was computationally benchmarked using self-consistency pipelines (ProteinMPNN/ESMFold), MotifBench for functional sites, and a strict AF2M+ binder interface metric, alongside real-world experimental validation that yielded a 12.5% hit rate against the Nipah virus Glycoprotein G
    • Method to select mutants computationally for lab testing.
    • "Stochastic beam search," a sequence-centric method that evaluates masked language models (MLMs) via pseudo-log-likelihood, rather than using costly mutation-centric approaches.
    • This technique is computationally efficient and produces higher-quality sequences by better balancing likelihood and diversity.
    • The method was extensively validated through both in silico evaluations across various models and direct head-to-head in vitro antibody campaigns.
    • In wet-lab testing, the optimized models effectively screened for synthesizability and binding, with supervised guidance achieving a 100% success rate in the experiments
  • 2026-04-30

    Lightning Boltz

    • protein design
    • Implementation adjustments to Boltz-2 that make it run much faster.
    • One of the biggest hurdles of running methods that use MSAs are MSA servers. They are computationally expensive and difficult to set up.
    • MSA carries a lot of predictive value so skipping this step is unwise.
    • Integrates MMseqs2-GPU directly into the Boltz-2 pipeline, removing the primary CPU bottleneck and enabling high-throughput, local structure prediction.
    • This implementation streamlines the MSA process, making the predictions order of magnitude faster.
  • 2026-04-30

    PromptMOL

    • structure prediction
    • PromptMOL: a PyMOL plugin that enables direct interaction with molecular structures using natural language commands.
    • While PyMOL is a gold standard in structural biology, its interface can be complex; PromptMOL removes the barrier to entry by replacing convoluted script syntax with simple, descriptive prompts.
    • By hooking directly into LLMs (via local models like LM Studio, or cloud providers like OpenAI and Anthropic), the plugin intelligently handles selections, coloring, structural analysis, and rendering tasks on the fly.
    • Generation of a large-scale, heterogeneous antibody developability dataset for AI benchmarking.
    • Built from 50 seed antibodies with up to 99 engineered variants each, resulting in thousands of unique, wet-lab-validated sequences.
    • Assesses six critical developability traits: expression, purity, thermostability, aggregation, polyreactivity, and hydrophobicity.
    • Benchmark results are currently accessible via Amazon Bio Discovery, with further findings slated for a formal publication later this year.