Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Novel inverse folding algorithm, studying the effect of pretraining on the effectiveness of Antibody design
    • Authors check multiple inverse folding regimens, pretraining on general proteins, ppi interfaces and antibody-antigen interfaces and likewise finetuning on these.
    • They only use the backbone atoms (N,C,Ca), with special provisions for Cb.
    • They mask portion of the sequence and have the model guess its amino acids.
    • The 37% recovery at 100% masking appears slightly lower than the same feat for proteinMPNN.
    • Pretraining on antibodies still holds a signal towards antibody-antigen complexes, showing the power of such pre-training.
    • Review on computational methods applied to nanobodies.
    • The review covers databases, modeling and design methods.
    • Much room is given to conformational sampling with molecular dynamics
    • They highlight a special class of nanobodies, quench-bodies (Q-bodies) that can also detect small molecules alongside normal proteins.
    • The focus presented is chiefly on binder design, rather than fine-tuning other biophysical properties.
    • Novel pipeline for computational protein design of nanobodies
    • Several tools are collated and adjusted to nanobody case - IgFold for structure prediction, HDOCK for docking and ABDESIGN, DiffAb and dyMEAN for backbone/sequence prediction.
    • They chiefly perform computational validation showing the performance on the RMSD/DockQ (re-docking) and the amino acid recovery. Results indicate that focusing on nanobodies provides benefit.
    • The entire pipeline can be used for de novo design and optimization.
    • Novel method for nanobody sequence re-design using quite a small network.
    • The model was pre-trained using a large-scale collection of nanobody sequences from the INDI dataset, heavy-chain antibody sequences from the OAS, and antibody complex structures from SabDab. For fine-tuning, affinity data was generated by with 17,500 nanobody–antigen interaction data points—7,500 generated via the ANTIPASTI model and 10,000 through random pairing—with a CD45 patent dataset used for testing. So all computational predictions are not real affinity points.
    • NanoGen uses a two-stage training framework with a shared encoder-decoder architecture based on CNN layers that learns sequence patterns via a Masked Language Modeling task. During generation, a guided discrete diffusion process, augmented with Discrete Bayesian Optimization, is employed to refine the sequence outputs for enhanced binding affinity.
    • The model was tested using sequence recovery (REC) and binding affinity improvement (pKD improvement). Benchmarking involved comparing NanoGen against baseline models such as ESM-2 650M, AbLangHeavy, and nanoBERT under both random masking and CDR-specific masking strategies on the CD45 patent dataset.
  • 2025-03-11

    Redefining antibody patent protection using paratope mapping and CDR-scanning

    • paratope prediction
    • experimental techniques
    • Proposal how to make antibody patents reasonable via mutational scanning.
    • If you develop a therapeutic antibody you want to claim a space around it so that no-one piggy backs off your effort by doing one substitution.
    • If you claim a ’homology space around your mabs’, then even a small amount of substitutions can circumvent 90-95% sequence identity of either CDRs or variable region.
    • Claiming that you own all antibodies that bind some protein (e.g. like Amgen did with pcks9) is too broad. That goes back to the ‘enablement’ of patents, as it needs to allow a skilled person to reproduce it. If you claim a handful of abs against pcks9, you do not exactly give a way to make ‘all others’.
    • Authors propose to make broader claims by point mutations in the CDRs in strategic paratope positions and characterizing binders. For a single lead you are looking at a ballpark 1,000 mutants, which is experimentally feasible. This would give hard data for a broad spectrum of binders around your candidates, giving wider protection.
    • New antigen-specific language model
    • Authors curated a dataset of antigen-specific antbiody seqeunces and fine tuned generic protein language model (don’t know which one) to it.
    • Dataset appears to be comprised mostly of plabdab and cov-abdab so very biased towards covid.
    • Antibodies are generated by prompting the model with the antigone sequence and generating the antibody on the basis of it.
    • Authors tested the generated antibodyes in the lab, including COVID antigens but also some that were less prevalent in the training set and they found binders.
    • Using language models & structural predictions to predict antibody-antigen interactions.
    • AntiBinder integrates sequence and structural information using IgFold for antibodies and ESM-2 for antigens, employing specialized encoders to extract meaningful features before passing them through multiple Bidirectional Attention Blocks (BidAttBlock) and a classifier.
    • The model was trained and evaluated on four datasets: COVID-19 (Cov-AbDab), HIV (LANL database), BioMap, and MET. These datasets contain antigen–antibody interaction pairs across multiple species and applications, covering viruses like SARS-CoV-2 and HIV, plenty of antigenic variants in total.
    • AntiBinder was benchmarked against 11 state-of-the-art models, including AttABseq, DG-affinity, DeepAAI, and general protein–protein interaction (PPI) models. AntiBinder did better.
    • Authors test the generalizability but chiefly within antigenic species, such as different covid variants or HIV mutants.
  • 2025-03-11

    Fast and accurate antibody sequence design via structure retrieval

    • databases
    • generative methods
    • structure prediction
    • Inverse folding and thus antibody design via database search.
    • Authors train a vector retrieval database on SAbDab. In this way for a single sequence one can figure out where it falls structurally.
    • They benchmark against state of the art inverse folding tools such as AbMPNN, AntiFold, ProteinMPNN and ESM-IF - their tools comes on top in terms of sequence retrieval.
    • The database search is orders of magnitude faster than the state of the art inverse folding tools.
    • They compare IgSeek versus FoldSeek - their tool gets a higher accuracy in sequence retrieval, for most CDRs, but CDR-H3. Therefore FoldSeek seems like a very good choice alongside IgSeek for such a database-driven inverse folding protocol.
    • Novel generative antibody method, CloneLM/CloneBO, following clonally plausible evolutionary paths.
    • They train CloneLM, an autoregressive language model, on antibody clonal family data from the OAS. There were two separate models for heavy and light sequences. They use FastBCR to call clonal families.
    • CloneLM generates new clonal families by conditioning on a given antibody sequence. They use a martingale posterior approach to ensure sampled sequences follow plausible evolutionary paths. So it takes antigen into account, but only by the virtue of the clonal family.
    • For benchmarking they train a language model oracle on a real human clonal family and use it as a simulated fitness function.
    • They further perform training on affinity and stability data to generate oracles for these and show that the newly generated sequences can be made to be more stable/have higher affinity.
    • Benchmark of antibody-antigen docking and co-folding methods.
    • Methods compared were AF2-multimer, RoseTTAfold, AbAdapt, SnugDock and ClusPro.
    • ClusPro was given either AlphaFold or AbodyBuilder2 models of an antibody - it performed better when given ABodyBuilder2 models.
    • AlphaFOld multimer performed the best overall - however the top pose being close to correct only 19% of the time.
    • It appears that the best models reproduce correct side-chain binding geometries from the PDB.