Computational Antibody Papers

Filter by tags

All

Filter by published year

All

TitleKey points

2025-03-11
Fast and accurate antibody sequence design via structure retrieval
- databases
- generative methods
- structure prediction
- Inverse folding and thus antibody design via database search.
- Authors train a vector retrieval database on SAbDab. In this way for a single sequence one can figure out where it falls structurally.
- They benchmark against state of the art inverse folding tools such as AbMPNN, AntiFold, ProteinMPNN and ESM-IF - their tools comes on top in terms of sequence retrieval.
- The database search is orders of magnitude faster than the state of the art inverse folding tools.
- They compare IgSeek versus FoldSeek - their tool gets a higher accuracy in sequence retrieval, for most CDRs, but CDR-H3. Therefore FoldSeek seems like a very good choice alongside IgSeek for such a database-driven inverse folding protocol.
2025-03-11
Redefining antibody patent protection using paratope mapping and CDR-scanning
- paratope prediction
- experimental techniques
- Proposal how to make antibody patents reasonable via mutational scanning.
- If you develop a therapeutic antibody you want to claim a space around it so that no-one piggy backs off your effort by doing one substitution.
- If you claim a ’homology space around your mabs’, then even a small amount of substitutions can circumvent 90-95% sequence identity of either CDRs or variable region.
- Claiming that you own all antibodies that bind some protein (e.g. like Amgen did with pcks9) is too broad. That goes back to the ‘enablement’ of patents, as it needs to allow a skilled person to reproduce it. If you claim a handful of abs against pcks9, you do not exactly give a way to make ‘all others’.
- Authors propose to make broader claims by point mutations in the CDRs in strategic paratope positions and characterizing binders. For a single lead you are looking at a ballpark 1,000 mutants, which is experimentally feasible. This would give hard data for a broad spectrum of binders around your candidates, giving wider protection.
2025-03-11
AntiBinder: utilizing bidirectional attention and hybrid encoding for precise antibody–antigen interaction prediction
- binding prediction
- Using language models & structural predictions to predict antibody-antigen interactions.
- AntiBinder integrates sequence and structural information using IgFold for antibodies and ESM-2 for antigens, employing specialized encoders to extract meaningful features before passing them through multiple Bidirectional Attention Blocks (BidAttBlock) and a classifier.
- The model was trained and evaluated on four datasets: COVID-19 (Cov-AbDab), HIV (LANL database), BioMap, and MET. These datasets contain antigen–antibody interaction pairs across multiple species and applications, covering viruses like SARS-CoV-2 and HIV, plenty of antigenic variants in total.
- AntiBinder was benchmarked against 11 state-of-the-art models, including AttABseq, DG-affinity, DeepAAI, and general protein–protein interaction (PPI) models. AntiBinder did better.
- Authors test the generalizability but chiefly within antigenic species, such as different covid variants or HIV mutants.
2025-03-11
Generation of antigen-specific paired heavy-light chain antibody sequences using large language models
- generative methods
- New antigen-specific language model
- Authors curated a dataset of antigen-specific antbiody seqeunces and fine tuned generic protein language model (don’t know which one) to it.
- Dataset appears to be comprised mostly of plabdab and cov-abdab so very biased towards covid.
- Antibodies are generated by prompting the model with the antigone sequence and generating the antibody on the basis of it.
- Authors tested the generated antibodyes in the lab, including COVID antigens but also some that were less prevalent in the training set and they found binders.
2025-02-20
Few-shot in-context learning with large language models for antibody characterization
- language models
- Demonstration that general purpose language models - like GPT3.5 - can reason about antibody - engineering tasks.
- Authors explore the topic of in context learning - e.g. few shot learning where several examples are given and on the basis of that the model needs to provide a prediction for a new case.
- They tested an array of general purpose models, such as GPTs, LLamas, Mistrals etc.
- They tested on three antibody tasks - mouse/human discrimination, specificity prediction (from ngs) and isotype identification. In theory not that difficult tasks, but remember we are dealing witha general purpose language model.
- They literally prompt the model with examples on, say mouse antibodies, human antibodies and provide a next one to predict.
- They find that the predictions are not bad, especially in few shot scenario (16 examples or so).
- In one test it even achieved accuracy on par with AntiBERTy.
2025-02-20
A comparison of antibody–antigen complex sequence-to-structure prediction methods and their systematic biases
- binding prediction
- Benchmark of antibody-antigen docking and co-folding methods.
- Methods compared were AF2-multimer, RoseTTAfold, AbAdapt, SnugDock and ClusPro.
- ClusPro was given either AlphaFold or AbodyBuilder2 models of an antibody - it performed better when given ABodyBuilder2 models.
- AlphaFOld multimer performed the best overall - however the top pose being close to correct only 19% of the time.
- It appears that the best models reproduce correct side-chain binding geometries from the PDB.
2025-02-20
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences
- generative methods
- ngs
- Novel generative antibody method, CloneLM/CloneBO, following clonally plausible evolutionary paths.
- They train CloneLM, an autoregressive language model, on antibody clonal family data from the OAS. There were two separate models for heavy and light sequences. They use FastBCR to call clonal families.
- CloneLM generates new clonal families by conditioning on a given antibody sequence. They use a martingale posterior approach to ensure sampled sequences follow plausible evolutionary paths. So it takes antigen into account, but only by the virtue of the clonal family.
- For benchmarking they train a language model oracle on a real human clonal family and use it as a simulated fitness function.
- They further perform training on affinity and stability data to generate oracles for these and show that the newly generated sequences can be made to be more stable/have higher affinity.
2025-02-20
ImmunoMatch learns and predicts cognate pairing of heavy and light immunoglobulin chains
- developability
- ngs
- Method to predict heavy light chain pairing.
- Heavy light chain pairing has long been posited to be random, or at the very least VERY promiscuous. Authors check that via training their model on different portions of the variable region and showing that there is signal where full sequences are used.
- Authors curated a set of ca. 233k positive heavy/light chain pairs from OAS. Negative samples were made by random shuffling - so they could occur in nature, just were not observed in this ds.
- They use Antiberta2 as a basis for training the classification model.
- The model achieves 0.75 and 0.66 ROC AUC on two test sets - so there seems to be some signal there.
- When the model is split between lamdbas/kappas, it does better - though lambda have signal for kappas (remember that lambda is a rescue rearrangement for not-working kappa).
- Naive B-cell pairs have less predictability than mature ones.
2025-02-17
Structure-informed language models are protein designers
- generative methods
- protein design
- One of the first studies showing that introducing structure to protein language models, improves the predictive ability.
- They fed ProteinMPNN (structural) inputs to ESM-1B to show that it improved recovery as opposed to using ESM-1B mask alone.
- To marry ProteinMPNN and ESM-1B they use an ‘adapter’. Adapters in machine learning are lightweight modules that modify or extend a model’s functionality without retraining all parameters; in LM-DESIGN, a structural adapter integrates structural information into protein sequence predictions by bridging the structure encoder and a pretrained language model (pLM).
- LM-DESIGN benchmarked against state-of-the-art protein inverse folding models, including ProteinMPNN, PiFold, GVP-Transformer, Structured Transformer, and GVP, while utilizing pretrained language models such as ESM-1b 650M and the ESM-2 series.
- LM-DESIGN was evaluated on CATH 4.2 and CATH 4.3 datasets using sequence recovery rates and perplexity, compared against baselines.
- LM-DESIGN outperformed individual models, improving sequence recovery by 4-12% points, surpassing ProteinMPNN and PiFold.
2025-02-17
DyAb: sequence-based antibody design and property prediction in a low-data regime
- binding prediction
- language models
- Method to employ low-N data for biologic engineering.
- Assuming we have a dataset of ~100 affinity data points, we can choose (100 choose 2) pairs where we know which one has a larger readout than the other (e.g. stronger affinity) giving combinatorially larger amount of data points to train on.
- The architecture used is CNN on top of a language model.
- Benchmarked on three internal campaigns, Il6, EGFR and an undisclosed target.