Computational Antibody Papers

Filter by tags

All

Filter by published year

All

TitleKey points

2025-05-08
RIOT
- annotation/numbering
- Fast and reliable numbering tool with an inbuilt free germline database, unifying functionalities of tools such as IgBlast, ANARCI etc.
- It can number both amino acid and nucleotide sequences.
- Rather than using statistical methods such as HMMs, MMSeqs-like methodology was used for rapid alignment.
- Alignments are more accurate than existing methods, with speed improvement, running on a CPU.
2025-05-08
AntPack
- annotation/numbering
- Fast, alignment-based antibody numbering tool, significantly outperforming existing software in processing speed.
- Uses a simplified global alignment with a custom scoring matrix, facilitating rapid numbering of millions of sequences efficiently.
- Ensures accuracy comparable to established methods (ANARCI, AbNum) while numbering large-scale antibody datasets.
- Emphasizes interpretability and robustness, providing transparent sequence scoring useful for humanization tasks.
2025-05-06
AbnNumPro
- annotation/numbering
- Offline toolkit for antibody numbering and CDR delineation (ABRs).
- Provides an offline toolkit integrating five established antibody numbering schemes (Kabat, Chothia, IMGT, Aho, Martin).
- Uses IMGT as the source of Germlines.
- Allows prediction of Complementarity-Determining Regions (CDRs) and Antigen-Binding Regions (ABRs) through Hidden Markov Models (HMMs).
- Addresses data security concerns by enabling offline usage, beneficial for therapeutic antibody development.
- Achieves high recall (0.92) in identifying ABRs, making it superior to existing tools which rely heavily on online services.
2025-05-06
ANARCII
- annotation/numbering
- New version of ANARCI - using language models.
- Employs a Seq2Seq language model eliminating the need for alignment-based numbering, thus generalizing well to novel sequences.
- Provides numbering that matches existing methods for >99.99% conserved residues and >99.94% CDR regions.
- Improved speed of the original HMM-based ANARCI when GPU is available.
- Can be fine-tuned for rare immunoglobulin domains (e.g., shark VNAR sequences, T-cell receptors), offering customizable antibody numbering workflows.
2025-04-28
Boltzdesign1: Inverting All-Atom Structure Prediction Model for Generalized Biomolecular Binder Design
- non-antibody stuff
- protein design
- Protein design method based on Boltz-1.
- Boltz-1 is an open-source reproduction of AlphaFold3, which uses a diffusion module to co-fold molecular structures (proteins, ligands, etc.).
- For design purposes, BoltzDesign1 sidesteps the full structure generation step and instead uses only the Pairformer (which outputs a distogram — a probabilistic representation of all pairwise residue distances). This allows broader exploration of sequence space, as it optimizes over the distribution of possible structures rather than committing to a single conformation.
- Given a target (such as a small molecule or protein), they weakly initialize a binder sequence using random logits. This sequence is then iteratively refined by backpropagating loss through the Pairformer (and optionally through the Confidence module) to increase the predicted quality of the binder–target interaction.
- A full 3D structure can be generated at the end using the Boltz-1 structure module, but this is not part of the optimization loop.
- They benchmarked their method in silico on small molecule targets and a set of protein–protein interactions from the BindCraft benchmark, comparing performance to RfDiffusion All-Atom.
2025-04-28
BindCraft: one-shot design of functional protein binders
- protein design
- non-antibody stuff
- BindCraft is an easy-to-use pipeline for computational protein binder design.
- It employs AlphaFold2-Multimer to hallucinate binders via backpropagation.
- Given a target structure and binder parameters (e.g., sequence length), the binder sequence is initialized with random logits and iteratively optimized via gradient descent through the AF2-Multimer network.
- After binder hallucination, the sequence and surface residues are further optimized using MPNNsol, and AF2-Monomer is used to repredict and filter high-confidence designs.
- Binder designs were validated experimentally through in vitro assays, X-ray crystallography, and cryo-EM.
- Reported success rates ranged from 25% to 100%, with most binders in the nanomolar affinity range, a few in the micromolar range, and backbone RMSDs of ~1.7 Å to 3.1 Å between design models and solved structures.
2025-04-28
Atom level enzyme active site scaffolding using RFdiffusion2
- protein design
- non-antibody stuff
- Improvement upon earlier RFDiffusion, enhancing stability and accuracy in designing enzyme active sites.
- Catalytic sites can now be specified at the atomic level instead of the residue backbone level used previously. This eliminates the need to explicitly enumerate side-chain rotamers.
- Training uses flow matching, a technique that simplifies and stabilizes the diffusion training process.
- Benchmarked on a set of 41 diverse enzyme active sites; RFdiffusion2 succeeded in all 41 cases, significantly outperforming the earlier RFDiffusion, which succeeded in only 16.
2025-04-28
Scaling unlocks broader generation and deeper functional understanding of proteins
- protein design
- non-antibody stuff
- Novel protein generative language model — ProGen3
- The model can do autoregressive generation N-to-C, C-to-N, and also supports span infilling.
- The architecture is a Transformer with a Sparse Mixture of Experts (MoE), activating about 27% of parameters per forward pass to improve computational efficiency.
- They studied how sampling affects training by trying different family-level weighting schemes. Uniform sampling across families (where small and large families have equal chance) gave better diversity and generalization, while unmodified sampling (letting big families dominate) performed worst.
- They validated the models by showing that generated proteins express well in wet lab experiments (split-GFP assays, spanning both highly novel and moderately novel sequence spaces).
- They used a large thermostability dataset to align model predictions to stability. This alignment is not standard fine-tuning — instead, preference optimization was applied, teaching the model to prefer sequences predicted to have higher stability. Upon experimental validation, aligned models indeed produced proteins with higher expression and stability.
2025-04-10
DeepSP: Deep learning-based spatial properties to predict monoclonal antibody stability
- Developability
- Neural network that provides structural features for an antibody sequence that can be used to predict developability.
- Models spacial charge map and spatial aggregation propensity - properties that are normally obtained by MD simulations.
- They model ca. 20,000 paired sequences from OAS and perform MD simulations to calculate the properties the canonical way.
- They use these features as labels to train a small neural network that achieves correlation 0.87 on average for each of the 30 features predicted.
2025-04-10
DeepSCM: an efficient convolutional neural network surrogate model for the screening of therapeutic antibody viscosity
- Developability
- CNN surrogate for costly SCM calculations to correlate with viscosity.
- Developed a shallow CNN (tens of thousands of params) to correlate with calculation of SCM using MD simulations - that are inherently slow.
- Correlation between CNN and MD-derived is ca. 0.8.
- The CNN’s output, when translated into a viscosity prediction (via a correlation with SCM score), achieves a reasonably high correlation with experimentally measured viscosity values—again, with correlation coefficients in the range of 0.7 to 0.8.