Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Nice developability dataset with associated computational modeling.
    • A total of 334 antibodies were initially characterized, with a subset of 43 antibodies selected for in vivo pharmacokinetic (PK) assessment. These data points included high-throughput developability assays and various physicochemical measurements.
    • A multivariate regression model, using Partial Least Squares (PLS) regression, was developed. This model combined multiple in vitro measures (nonspecific interactions, self-association, and FcRn binding) to predict in vivo clearance, significantly improving PK correlation over individual assays.
  • 2025-02-03

    Benchmarking Inverse Folding Models for Antibody CDR Sequence Design

    • generative methods
    • protein design
    • nanobodies
    • Benchmarking of sequence design methods that are structure-conditioned
    • ESM-IF, LM-Design, ProteinMPNN and AntiFold were benchmarked.
    • On sequence recovery, AntiFold beats others on antibodies, but LM-Design is better when VHHs are considered.
    • AntiFold makes minimal use of the antigen information.
    • ESM-IF and ProteinMPNN have some weak correlation with affinity data.
    • Novel method to design antibodies in silico with experimental validation.
    • The actual computational method is not disclosed.
    • The computational method takes target sequence/structure and constraints where the antibody should bind. The structure and sequence are then produced.
    • Method can generate nanomolar grade binders.
    • The main interesting take-away is test-time compute. By feeding the answers of the model back to itself, it produces better binders and does not compromise on diversity of the designs.
  • 2025-02-03

    Clinical antibody ADA

    • developability
    • clinical trials
    • Authors study 171 Roche clinical studies representing 28 drugs for their ADA incidence.
    • Authors demonstrate that ADA is highly context-specific with non-trivial inter-drug variation and factors such as disease or mode of action impacting the incidence.
    • They train a random forest model on T-cell epitope predictions and a model combined with non-epitope features. The extended model, including non-epitope features performs better than the one that is solely sequence-based.
    • Novel experimental/computational workflow that demonstrates how little data might be needed to develop antibody affinity predictors.
    • Mice were immunized with hen egg white lysozyme and via computational procedure of clustering with known binders 35 antibodies were characterized together with their affinities.
    • These 35 antibodies were used to train the methods: Gaussian Process (GP) models with Matern and RBF kernels, Kernel Ridge Regression (KRR), Random Forest (RF) and Linear Regression (used as a baseline).
    • Seed sequences were point or double-mutated and their affinity predicted using GP (that performed the best). Eight mutants predicted to span the whole range of affinities were selected for experimental testing and they had very good agreement with the predictions.
    • Computational framework to calculate descriptors correlating with certain developability features for early antibody screening.
    • The framework calculates a number of sequence and structural descriptors.
    • The correlations were demonstrated to bring value on a HIC and viscosity datasets.
    • Exact calculation of descriptors takes time, so authors showed that it is possible to train a ML model to get the descriptors right away from sequence.
    • Computational analysis of pK (clearance) of biologics based on a dataset collated for this publication.
    • Authors collated a set of 64 therapeutic antibodies and their clearances.
    • Here, they defined fast clearance as more than 5.4 mL/day/kg. 48 antibodies fel below this threshold and 16 above.
    • They tested whether any single computationally calculated property (e.g. isoelectric point etc.) determines fast vs slow clearance.
    • No single computational property was a good discriminator.
    • THey constructed a random forest algorithm and showed that the poly specify reagent (PSR), which is an in vitro property and isoelectrip point, which can be computationally calculated are the strong discriminators according to the model.
    • Authors revisit computational calculations from sequence and structure to filter out clinical stage therapeutics as an alternative/refinement to the popular TAP metrics.
    • Authors explain how the FvCSP charge asymmetry calculated in TAP might not be the ideal formulation.
    • They introduce FV_CHML which as opposed to FvCSP is a difference between the net charges.
    • Of the several computational metrics employed they show that the FV_CHML metric captures most of the clinical stage therapeutics.
    • They analyse the effect of the isotype, demonstrating that for accurate pI calculations, constant region should be modeled and not only the Fv
    • They propose four descriptors that appear to show good degree of separation of natural vs clinical antibodies and some correlation with the experimental values: 1. Patch_cdr_hyd - hydrophobicity of CDRs, not the same as in TAP 2. ens_charge_Fv - in lieu of PPC and PNC from TAP 3. Cdr_len - these separate repertoire from clinical abs. 4. Fv_chml - in lieu of FvCSP from TAP
    • Prost5 : novel language model using the FoldSeek structural representation to introduce structural dimension to the model.
    • The foldseek 3Di representation is used to encode 3D protein structures as 1D token sequences, enabling seamless translation between amino acid sequences and structural representations.
    • The model was fine-tuned on 17 million AlphaFoldDB structures using ProtT5 as a base, with bi-directional translation tasks to map between amino acid (AA) and 3Di sequences.
    • ProstT5 achieves 3600-fold faster remote homology detection compared to AlphaFold-based methods, while maintaining near-experimental accuracy and improving fold classification tasks like CATH.
    • ProstT5 embeddings outperform ProtT5, ESM-1b, and Ankh for structure-related tasks and show competitive performance in inverse folding, generating diverse sequences with preserved structural similarity. Though in most cases ProteinMPNN still performs better for inverse folding.
    • Novel language model for antibodies, blending sequence and structural information.
    • The model encodes sequence ‘as usual’ and uses GVP-GNN (like esm-if) for structural representation. Only the three backbone atoms (C,N,Ca) are taken per residue to get the structural representation.
    • The data is a mix of sequence data and X-ray structures. The sequence datasets were modeled using ImmuneBuilder to increase structural coverage.
    • The model has an MLM objective on sequence & structure with three losses - sequence only, sequence + structure and structure only.
    • On sequence infilling IgBLEND performs better than other methods (e.g. AbLang, Nanobert), though arguably CDR-H3 predictions look very ‘close’ across the board.
    • On inverse folding the method performs quite a stretch better with large gaps in CDR-H3 with notable improvements for nanobodies - that other methods like ESM-IF or AntiFold did not handle natively.