Authors demonstrate that using inverse folding, one can affinity mature antibodies, confirmed experimentally.
Authors employ ESM-IF as the inverse folding algorithm.
They take two existing antibodies, bebletovimab and BD55-5840, both instrumental in COVID-19.
They introduce all possible single point mutations to the Vh and Vl regions (about 4300). They pick the best perplexity for experimental characterization.
The best perplexity ones have many framework mutations (bebletovimab 10/14 and BD55 5840 3/6). There was only one mutation to CDR-H3 in Bebletovimab.
Inverse folding mother achieves much better performance when antigen is used as well.
Proposal for modeling antibodies using language, that is more fit-for purpose than current approaches.
It is plausible to represent antibodies/proteins as language to draw from existing trove of research on natural language.
Current approaches of porting the models from natural language to proteins/antibodies verbatim, might not release their full potential because of not focusing on key differences between natural language and proteins.
Authors propose a more fit for purpose formalization, where quite an important part is better token definition and associating them with function. For instance do not simply use amino acids or k-mers but have something more complex such as C*U and RA*, associated with hydrophobicity, binding zinc fingers or similar.
Authors employ patent data to develop a model (selfPAD) of humanness that achieves state of the art in immunigenicity prediction.
They employ data from PAD which at the time were roughly 290k sequences from 16,000 patent families.
They recognize the noisiness inherent to the patent data and employ a training procedure to train a latent representation of patent sequences that is associated with function - in this case the target of the sequence.
In the first stage of training they employ contrastive learning, with sequence for the same target trained to be ‘closer’ in latent space and those against different targets to be ‘farther away’.
In the second stage, they perform fine tuning on humanness detection.
They tested their method on humanness prediction, ADA prediction and agreement with humanization choices. Taking all the tests together their method achieves the best performance.
Authors expand the existing IMGT-mab-db with knowledge graph querying via user-friendly interface.
As of February 2024, IMGT/mAb-KG contains 139,629 triplets, 1,867 concepts, 114 properties, and links 21,842 entities. It includes detailed information on approximately 1,500 mAbs, 500 targets, and 500 clinical indications.
It is linked to various external resources, such as Thera-SAbDab, PharmGKB, PubMed, and HGNC, making it a valuable tool for researchers and developers working on therapeutic mAbs
An active learning framework is proposed to efficiently identify antibody mutations that enhance binding affinity, minimizing wet-lab experiments.
From the paper: “Active learning is a framework from experimental design that focuses on making informed decisions about which experiments to perform next”. That’s important these days to effectively generate prediction first data to maximize the effectiveness of models.
Bayesian optimization is used with relative binding free energy (RBFE) methods to iteratively propose and evaluate new antibody sequences, improving binding affinity predictions.
Various encoding schemes, including one-hot, bag of amino acids, BLOSUM, and AbLang2, are tested. The best performing methods are identified through validation with pre-computed data.
The study uses two RBFE methods: NQFEP for accurate but costly simulations, and Schrödinger Res Scan for faster but less precise results. The active learning loop consistently finds better binding sequences using these methods.
AbLang2 encoding with the Tanimoto kernel consistently outperformed other methods in the validation phase, indicating its effectiveness in predicting improved binding affinities.
NQFEP method provides more accurate but computationally expensive simulations, while the Schrödinger Res Scan offers quicker but less precise results. Despite the higher computational cost, the NQFEP method might be preferable when high accuracy is crucial.
Novel docking score based on graph model and language model embeddings, with an antibody-specific variety.
It is based on the Equiformer models, giving the structure an equivariant representation. With respect to previous work, authors moved from atom to residue-level representations, input complexes rather than single chains and provide NLP embeddings.
They employ DistilBert for embeddings, training it on interaction data from BioGrid.
The antibody specific model is trained on antibody data downloaded from ABDB to distinguish native from non-native poses after local docking with RosettaFold.
The antibody-specific model is in fact heavy-antigen as the authors did not see benefit in the light-only model and having it three-way heavy-light-antigen was too computationally expensive.
Though the antibody-specific scoring method has predictive power in distinguishing native from non-native poses.
The antibody-specific model did not outperform AF2-multimer, however authors note that it has predictive power that can be harnessed to rescoring AF2-multimer outputs.
The code is available in https://gitlab.com/mcfeemat/eudockscore
Authors introduce a large antibody-specific language model, Fabcon (2.4B) that demonstrably improves on predicting antibody specificity.
Model is based on the Falcon LLM and is trained no the CLM objective (predict next amino acid going N-to-C terminal)
The model was trained on paired (2.5m) and unpaired (821m) data from OAS.
The pre-trained model was tested on its ability to fine tune on binder prediction using three datasets anti-her2, anti-sars-cov2 and anti-il6.
When comparing against multiple other models on the binders prediction, the largest Fabcon model fares best, showing the benefit of overparametrization.
Since the model was trained on CLS objective, it can be used for sequence generation, producing sequences that are very human-like as compared to human PBMCs.