Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Novel humanization software, allowing for rapid re-design of both heavy and light chains.
    • Unlike other tools such as Hu-mAb and Sapiens, which humanize heavy and light chains separately, Humatch jointly humanizes both chains, improving stability and reducing the risk of immunogenic epitopes between chains.
    • Humatch consists of three lightweight Convolutional Neural Networks (CNNs). Each CNN is trained for a specific task: one for heavy chains (CNN-H), one for light chains (CNN-L), and one for assessing natural heavy/light chain pairing (CNN-P). The CNNs are designed to output multiclass predictions for identifying human V-genes and classifying chain pairings.
    • The CNNs were trained on data from the Observed Antibody Space (OAS), which includes millions of human and non-human antibody sequences.
    • Humatch's performance was measured through precision-recall and ROC-AUC metrics, achieving near-perfect accuracy in classifying human and non-human sequences. Performance was also tested by humanizing 25 precursor antibodies and comparing the mutations with experimentally derived humanized versions, showing high overlap (77-82%) with experimental designs.
    • Authors describe how using a structure predictor one can re-design the binding site, to maintain binding.
    • They use a proprietary GaluxDesign method, the method achieves 1.4 Å Ca RMSD in predicting CDR-H3 loop structures, leveraging a unique scoring metric (G-pass rate) that assesses both confidence and structural consistency for antibody design.
    • The method outperforms AlphaFold 2.3, ABlooper, and ImmuneBuilder in predicting CDR-H3 loop structures, with significantly lower RMSD values (1.4 Å compared to 2.4-3.7 Å), particularly on a more challenging, time-separated dataset.
    • The binding propensity to HER2 was evaluated using a large mutant library and calculated via the G-pass rate, outperforming AlphaFold's PAE-based scoring. The model showed strong discrimination with an AUROC of 0.758, compared to 0.529 for AlphaFold. The novel loop is scored using their metric (G-pass rate) in complex with Her2.
    • Novel antibody sequences were designed by predicting six CDR loops in antibody-protein complexes, using GaluxDesign models. These designs were experimentally tested, achieving high success rates, including a 13.2% success rate for HER2 antibody designs using yeast display methods.
    • Authors demonstrate that using scores from DeepAb one can sort mutations in an antibody that improve affinity and a series of other properties.
    • The authors used the DeepAb structure prediction mode model to rank mutations based on their impact on structure prediction confidence, leading to the design of 200 novel anti-hen egg lysozyme (HEL) antibody variants.
    • Single-point mutations from a deep mutational scanning (DMS) dataset (Warszawski et al.) were combined into multi-mutation variants (up to 7 mutations), and these variants were selected based on DeepAb scores for experimental testing.
    • The designed variants were expressed and tested for thermostability, colloidal stability, and binding affinity to HEL.
    • Large percentage of the variants showed improved thermostability (91%) and affinity (94%), with 10% showing significant increases in binding affinity.
    • A subset of 27 high-performing variants was further tested for developability characteristics, including nonspecific binding, aggregation propensity, and self-association, ensuring their practical usability.
    • Novel language model applied to predicting antibody binding affinity in antigen-less manner.
    • AntiFormer is a graph-based large language model that combines sequence information with graph structures to predict antibody binding affinity. Its dual-flow architecture includes a transformer-based encoder for sequence features and a graph convolutional network (GCN) for capturing structural relationships (from sequence!), offering enhanced prediction accuracy.
    • AntiFormer was compared against advanced models like AntiBERTy and AntiBERTa, as well as basic transformer models with 6 and 12 layers, demonstrating superior performance across all evaluation metrics. It shows a better performance but not by a huge margin.
    • The model's performance was evaluated using affinity datasets, including the Observed Antibody Space (OAS) database and an additional dataset containing 104,972 antibody sequences with annotated affinity values, highlighting its accuracy and efficiency.
  • 2024-09-04

    Adapting protein language models for structure-conditioned design

    • language models
    • binding prediction
    • protein prediction
    • Novel language model incorporating structural information, with demonstrated experimental ability to improve design of therapeutic antibodies.
    • The new language model, ProseLM, builds upon Progen family of models from the same authors.
    • Structural information in the form of structural adapter layers after language model layers, encoding backbone and associated functional annotations.
    • Models with more parameters achieve much better perplexity. There is also some improvement by adding tangential context information such as ligands etc.
    • They trained an antibody-specific version of ProseLM, only on SABDAB data and it does much better on sequence recovery even than the larger models.
    • They use the model to propose mutations for Nivolumab ad Secukinumab, with mutations both in CDRs and Frameworks. THey used structures from the PDB as the basis for designs.
    • They found better binders, however if CDRs were re-designed the overall success rate of maintaining binding was lower (25% for Nivolumab) than when frameworks were redesigned (92%).
  • 2024-09-04

    p-IgGen: A Paired Antibody Generative Language Model

    • generative methods
    • language models
    • developability
    • Novel generative model for antibody sequences that supports Vh/Vl pairing and generation of developable sequences.
    • Three models were created, IgGen (unpaired model), p-IgGen (unpaired fine-tuned on pairs) and developable p-IgGen (paired fine-tuned on developable sequences).
    • They used ca. 250m unpaired sequences and 1.8m paired sequences for training.
    • The model is based on GPT-2 but with rotary position embedding.
    • Developable sequences were defined as structural models of the 1.8m that had good TAP metrics (900,000 in total).
    • The model is much smaller than many of the models out there, (17m params), so it is more lightweight in training and application.
    • The model performs better on immunogenicity prediction than other models but worse on expression prediction.
    • Novel CDR-H3 structure prediction method, ComMat based on ensemble sampling.
    • Rather than generating a single structure, the method generates several solutions that are then all informing the next iteration.
    • The method was integrated into the structure module of AlphaFold2.
    • Crucially, with the introduction of the second prediction into the ‘community’, the predictions become better. However these quickly plateau, showing the limits of the approach.
    • The method does not produce better results than ABodyBuilder2 and EquiFold.
    • Novel humanization protocol employing language models and large-scale repertoire data.
    • Human OAS and germline sequences are embedded using ESM2.
    • K-nearest neighbors algorithm is then used to introduce mutations into the ESM-2 embedded query sequence coming from closest functional neighbors in the ESM2-embedded OAS+germlinse space.
    • Results of humanized abs are validated experimentally via ELISA.
  • 2024-08-28

    AntiBARTy Diffusion for Property Guided Antibody Design

    • language models
    • generative methods
    • developability
    • Novel language model AntiBARTy with demonstration of how to use it to diffuse novel antibodies with favorable solubility properties.
    • The core model is a BART-based transformer, with 16m parameters.
    • It was firstly trained on all human heavy and light chains from OAS (254m heavies and 342m lights <- yes, more lights). This was followed by fine tuning on the higher quality paired data from OAS.
    • The diffusion model was based on U-net (CNN used for segmentation of medical images), totaling 3m parameters.
    • They define low and high solubility classes as predicted by protein-sol on paired OAS, with roughly 20k samples for each class.
    • Overall, one can sample from multivariate to get a vector in Antibarty latent space and use it to get an antibody sequence that is either high or low protein-sol predicted solubility.
    • Authors introduce AntPack - software for rapid numbering of antibody sequences, germline identification and humanization.
    • Authors use a mixture model (so not ML!) on millions of sequences from NGS.
    • The sequences are pre-numbered to standardize them and then assigned to clusters which offer explainability on germline assignment and residue probability at a given position.
    • The method is very fast in comparison to HMM-based approaches such as ANARCI.
    • Method is available via https://github.com/Wang-lab-UCSD/AntPack