Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
  • 2025-02-17

    Structure-informed language models are protein designers

    • generative methods
    • protein design
    • One of the first studies showing that introducing structure to protein language models, improves the predictive ability.
    • They fed ProteinMPNN (structural) inputs to ESM-1B to show that it improved recovery as opposed to using ESM-1B mask alone.
    • To marry ProteinMPNN and ESM-1B they use an ‘adapter’. Adapters in machine learning are lightweight modules that modify or extend a model’s functionality without retraining all parameters; in LM-DESIGN, a structural adapter integrates structural information into protein sequence predictions by bridging the structure encoder and a pretrained language model (pLM).
    • LM-DESIGN benchmarked against state-of-the-art protein inverse folding models, including ProteinMPNN, PiFold, GVP-Transformer, Structured Transformer, and GVP, while utilizing pretrained language models such as ESM-1b 650M and the ESM-2 series.
    • LM-DESIGN was evaluated on CATH 4.2 and CATH 4.3 datasets using sequence recovery rates and perplexity, compared against baselines.
    • LM-DESIGN outperformed individual models, improving sequence recovery by 4-12% points, surpassing ProteinMPNN and PiFold.
    • Method to employ low-N data for biologic engineering.
    • Assuming we have a dataset of ~100 affinity data points, we can choose (100 choose 2) pairs where we know which one has a larger readout than the other (e.g. stronger affinity) giving combinatorially larger amount of data points to train on.
    • The architecture used is CNN on top of a language model.
    • Benchmarked on three internal campaigns, Il6, EGFR and an undisclosed target.
    • New (old :) ) therapeutic antibody database, larger than what is available from other sources several times.
    • Includes over 2,900 investigational antibody candidates and more than 450 approved or late-stage molecules.
    • It tracks molecular format, target antigen, development status, clinical history, and company data, along with antibody isotype, conjugation status, and mechanism of action.
    • Analysis highlights a rise in bispecifics, ADCs, and immunoconjugates, with most clinical-stage antibodies targeting cancer and originating from China or the U.S.
    • The data are collected from public sources beyond INN lists, including company websites, press releases, clinical trial registries, regulatory agencies, and literature reports.
    • Novel method to design antibodies de novo.
    • Architecturally, it is a mix of language models, diffusion and structure prediction methods.
    • Training happens by noising diffusion, firstly perturbing structure and making the model get it right and afterwards doing the same thing for sequences.
    • After these two steps the model is distilled into a consistency model. This results in a model that can get the final coordinates/sequence in a single step rather than iterative denoising.
    • Method achieves comparable accuracy to many methods out there, such as DiffAb, dyMEAN and others.
    • On docking, the best performance is in the order of 4A iRMSD when using an AlphaFold3 antibody model - so still some challenges remain.
    • No wetlab validation.
    • Nice developability dataset with associated computational modeling.
    • A total of 334 antibodies were initially characterized, with a subset of 43 antibodies selected for in vivo pharmacokinetic (PK) assessment. These data points included high-throughput developability assays and various physicochemical measurements.
    • A multivariate regression model, using Partial Least Squares (PLS) regression, was developed. This model combined multiple in vitro measures (nonspecific interactions, self-association, and FcRn binding) to predict in vivo clearance, significantly improving PK correlation over individual assays.
  • 2025-02-03

    Benchmarking Inverse Folding Models for Antibody CDR Sequence Design

    • generative methods
    • protein design
    • nanobodies
    • Benchmarking of sequence design methods that are structure-conditioned
    • ESM-IF, LM-Design, ProteinMPNN and AntiFold were benchmarked.
    • On sequence recovery, AntiFold beats others on antibodies, but LM-Design is better when VHHs are considered.
    • AntiFold makes minimal use of the antigen information.
    • ESM-IF and ProteinMPNN have some weak correlation with affinity data.
  • 2025-02-03

    Clinical antibody ADA

    • developability
    • clinical trials
    • Authors study 171 Roche clinical studies representing 28 drugs for their ADA incidence.
    • Authors demonstrate that ADA is highly context-specific with non-trivial inter-drug variation and factors such as disease or mode of action impacting the incidence.
    • They train a random forest model on T-cell epitope predictions and a model combined with non-epitope features. The extended model, including non-epitope features performs better than the one that is solely sequence-based.
    • Novel method to design antibodies in silico with experimental validation.
    • The actual computational method is not disclosed.
    • The computational method takes target sequence/structure and constraints where the antibody should bind. The structure and sequence are then produced.
    • Method can generate nanomolar grade binders.
    • The main interesting take-away is test-time compute. By feeding the answers of the model back to itself, it produces better binders and does not compromise on diversity of the designs.
    • Computational framework to calculate descriptors correlating with certain developability features for early antibody screening.
    • The framework calculates a number of sequence and structural descriptors.
    • The correlations were demonstrated to bring value on a HIC and viscosity datasets.
    • Exact calculation of descriptors takes time, so authors showed that it is possible to train a ML model to get the descriptors right away from sequence.
    • Computational analysis of pK (clearance) of biologics based on a dataset collated for this publication.
    • Authors collated a set of 64 therapeutic antibodies and their clearances.
    • Here, they defined fast clearance as more than 5.4 mL/day/kg. 48 antibodies fel below this threshold and 16 above.
    • They tested whether any single computationally calculated property (e.g. isoelectric point etc.) determines fast vs slow clearance.
    • No single computational property was a good discriminator.
    • THey constructed a random forest algorithm and showed that the poly specify reagent (PSR), which is an in vitro property and isoelectrip point, which can be computationally calculated are the strong discriminators according to the model.