Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Novel CDR-H3 structure prediction method, ComMat based on ensemble sampling.
    • Rather than generating a single structure, the method generates several solutions that are then all informing the next iteration.
    • The method was integrated into the structure module of AlphaFold2.
    • Crucially, with the introduction of the second prediction into the ‘community’, the predictions become better. However these quickly plateau, showing the limits of the approach.
    • The method does not produce better results than ABodyBuilder2 and EquiFold.
    • Novel humanization protocol employing language models and large-scale repertoire data.
    • Human OAS and germline sequences are embedded using ESM2.
    • K-nearest neighbors algorithm is then used to introduce mutations into the ESM-2 embedded query sequence coming from closest functional neighbors in the ESM2-embedded OAS+germlinse space.
    • Results of humanized abs are validated experimentally via ELISA.
  • 2024-08-28

    AntiBARTy Diffusion for Property Guided Antibody Design

    • language models
    • generative methods
    • developability
    • Novel language model AntiBARTy with demonstration of how to use it to diffuse novel antibodies with favorable solubility properties.
    • The core model is a BART-based transformer, with 16m parameters.
    • It was firstly trained on all human heavy and light chains from OAS (254m heavies and 342m lights <- yes, more lights). This was followed by fine tuning on the higher quality paired data from OAS.
    • The diffusion model was based on U-net (CNN used for segmentation of medical images), totaling 3m parameters.
    • They define low and high solubility classes as predicted by protein-sol on paired OAS, with roughly 20k samples for each class.
    • Overall, one can sample from multivariate to get a vector in Antibarty latent space and use it to get an antibody sequence that is either high or low protein-sol predicted solubility.
    • Authors introduce AntPack - software for rapid numbering of antibody sequences, germline identification and humanization.
    • Authors use a mixture model (so not ML!) on millions of sequences from NGS.
    • The sequences are pre-numbered to standardize them and then assigned to clusters which offer explainability on germline assignment and residue probability at a given position.
    • The method is very fast in comparison to HMM-based approaches such as ANARCI.
    • Method is available via https://github.com/Wang-lab-UCSD/AntPack
    • Authors demonstrate that using inverse folding, one can affinity mature antibodies, confirmed experimentally.
    • Authors employ ESM-IF as the inverse folding algorithm.
    • They take two existing antibodies, bebletovimab and BD55-5840, both instrumental in COVID-19.
    • They introduce all possible single point mutations to the Vh and Vl regions (about 4300). They pick the best perplexity for experimental characterization.
    • The best perplexity ones have many framework mutations (bebletovimab 10/14 and BD55 5840 3/6). There was only one mutation to CDR-H3 in Bebletovimab.
    • Inverse folding mother achieves much better performance when antigen is used as well.
    • Proposal for modeling antibodies using language, that is more fit-for purpose than current approaches.
    • It is plausible to represent antibodies/proteins as language to draw from existing trove of research on natural language.
    • Current approaches of porting the models from natural language to proteins/antibodies verbatim, might not release their full potential because of not focusing on key differences between natural language and proteins.
    • Authors propose a more fit for purpose formalization, where quite an important part is better token definition and associating them with function. For instance do not simply use amino acids or k-mers but have something more complex such as C*U and RA*, associated with hydrophobicity, binding zinc fingers or similar.
    • Authors employ patent data to develop a model (selfPAD) of humanness that achieves state of the art in immunigenicity prediction.
    • They employ data from PAD which at the time were roughly 290k sequences from 16,000 patent families.
    • They recognize the noisiness inherent to the patent data and employ a training procedure to train a latent representation of patent sequences that is associated with function - in this case the target of the sequence.
    • In the first stage of training they employ contrastive learning, with sequence for the same target trained to be ‘closer’ in latent space and those against different targets to be ‘farther away’.
    • In the second stage, they perform fine tuning on humanness detection.
    • They tested their method on humanness prediction, ADA prediction and agreement with humanization choices. Taking all the tests together their method achieves the best performance.
    • Authors expand the existing IMGT-mab-db with knowledge graph querying via user-friendly interface.
    • As of February 2024, IMGT/mAb-KG contains 139,629 triplets, 1,867 concepts, 114 properties, and links 21,842 entities. It includes detailed information on approximately 1,500 mAbs, 500 targets, and 500 clinical indications.
    • It is linked to various external resources, such as Thera-SAbDab, PharmGKB, PubMed, and HGNC, making it a valuable tool for researchers and developers working on therapeutic mAbs
    • It is accessible via https://www.imgt.org/mAb-KG/
    • Authors investigate the role of the negative/positive binding data composition on predictive power of ab-ag interaction prediction.
    • They employ Absolut! Framework to generate synthetic binding data.
    • For binding prediction they use a 10 hidden neuron fully connected network.
    • CDR-H3 is used as a proxy for binding for computational expediency.
    • Altogether, if data in positive and negative are quite similar, there is better chance to out of distribution generalization.
  • 2024-08-21

    ABodyBuilder3: Improved and scalable antibody structure predictions

    • structure prediction
    • language models
    • Updated version of the popular ABodyBuilder2 program to model antibodies, that is more efficient.
    • Unlike ABB2 that did not employ language model embeddings, ABB3 does (ProtT5 to be precise).
    • Instead of using several models and aggregating the results for error prediction, they train pLDDT within the model.
    • ABB3 achieves CDR-H3 RMSD in the ballpark of 2.4Å whereas the previous version in the region of 2.5Å.