Authors propose a new training regime for language models of antibodies, noting that previous approaches drew a lot from natural language.
They use a masking rate between 50-70%, opposed to the normal pre-training rate of 15% in natural languages.
Rather than masking individual residues they also mask entire spans of sequence, with focus on masking CDR-H3.
Infilling using PARA is much more accurate in the CDR-H3 region - they get 48.7% accuracy in CDR-H3 versus 36.4%, closest SOTA, AntiBERTy.
They applied their model to downstream tasks such as heavy/light calling and prediction of trastuzumab binding, however the gains here were modest with respect to other models (e.g. Her2 binding using simple CNN model got accuracy of 82.8 vs 83.7 using their method).
The model is available at: https://github.com/xtalpi-xic/PARA/tree/main
Novel dataset of 0.5m anti-trastuzumab antibodies together with benchmarking of affinity classification methods.
They generated a dataset of ~500,000 anti-Her2 trastuzumab variants by modifying the CDR-H3. The binding affinity is divided into high/medium/low with reasonably even split (178,160, 196,392, and 171,732 respectively).
They split their dataset into positives and negatives by putting medium and low binders into the negative set.
They contrast their dataset with this from Mason et al. (~39k variants vs theirs 0.5m), to show that in a small (relatively) number of cases, binders in one set can be labeled as negative in another.
They test that the predictor developed by Mason et al. for binding/non binding classification works as intended on the novel 0.5m dataset. Likewise their model trained on Mason data and tested on their ds (and vice versa) has predictive power. It does but to a much lesser extent than training on data from the same experiment.
As methods to benchmark they used FLAML https://arxiv.org/abs/1911.04706, CNN and EGNN.
CNN and FLAML are top performers, but CNN performs well on the small data (signal starting with ~170 sequences).
Performance drops radically when train/val splits are done with respect to the clonotype.
THey tested AbLang, ProteinMPNN, ESM and Blosum on their ability to generate binding trastuzumab variants. As control they also randomly generated sequences. They observed the following percentages of sequences exhibiting CNN-HER2-max binding probabilities greater than 90%: 13% for Random, 26% for BLOSUM, 27% for AbLang (when masking all ten residues simultaneously), 29% for AbLang (when masking one residue at a time), 19% for ProteinMPNN, and 30% for ESM (when masking one residue at a time), respectively.
Novel method to design antibodies, validated on VHHs.
The protocol includes an antibody-fine tuned RFdiffusion to create the coordinates for the backbone followed by creating the sequence using ProteinMPNN (not antibody-fine-tuned). Filtering by RoseTTafold2 can be used.
Fine-tuning RFDiffusion includes giving the coordinates of the antigen, framework of the antibody and information on the epitope. Only the CDR coordinates and orientation of the framework/CDRs are designed.
They fine-tuned RoseTTafold2 with the aim to filter out the RFdiffusion designs.
The fine-tuned RoseTTafold2 can accurately distinguish correct and incorrect ab-ag pairs, but when hot-spot information is provided (aka local docking).
The best designs range from micromolar and high nanomolar and roughly 1 in 100 designs works as intended : https://www.nature.com/articles/d41586-024-00846-7
Benchmarking of deep learning methods on a range of antibody design tasks.
Design tasks included are expression, thermostability, immunogenicity, aggregation, polyreactivity and binding.
The tested methods include IGLM, AntiBERTy, ProtGPT2, ProGen2, ProteinMPNN, ESM-IF and Rosetta.
The scoring procedure includes relating the model perplexities with the experimental values.
The non-antibody specific model ProGen2 had the best correlations with the scores but no method is best in all fitness classes.
Predicting intrinsic properties of antibodies gives better performance than extrinsic ones.
Models are better at distinguishing differences within a set of antibodies derived from a single wild type (intra-family) rather than assessing the differences between antibodies of different origins (inter-family).
Models with larger number of parameters noted sizeable improvements in fitness correlation with thermostability and polyreactivity
Sequence based methods perform similarly to structure based ones though, their averages are slightly higher.
They introduce six molecular descriptors (MolDesk) that perform better than the therapeutic antibody profiler (TAP) and MOE.
The method used for calculating hydrophobicity and the specific scale employed can have a notable influence on the evaluation of antibody developability characteristics, as highlighted by Waibl et al. in 2022.
They use APBS (Adaptive Poisson-Boltzmann Solver) to calculate electrostatic potential of the molecules.
They calculate the values generated by a conformational ensemble using MD.
Their calculation of hydrophobicity showed sensitivity to the hydrophobicity definition.
Their measure of CDR negative charge outperforms other methods on viscosity experimental dataset.
They used a dataset of 64 antibodies with determined clearance and showed that hydrophobicity base descriptors do not discriminate between fast clearance and slow clearance whereas CDR_APBS_pos does.
They tested the effect of the model on the calculation of properties - they achieve broadly similar results.
After molecular dynamic simulations of the models, the results between the models are blurred, showing the importance of conformational sampling.
Their final set of metrics: “These descriptors include CDR_APBS_neg, a significant factor influencing viscosity and colloidal stability (Figure 3), CDR_APBS_pos, a major driver of PK clearance and polyspecificity (Figure 4), CDR_HPATCH_WW, which plays a key role in SEC, HMW, and HIC (Figure 5), CDR_HPATCH_BM which correlated the best with HIC, as well as total CDR length and APBS charge asymmetry (APBS_CAP) as two additional descriptors to align with the type of descriptors considered in the TAP metrics”
They compared TAP and MolDesk on the basis of whether molecules progressed or regressed in the tests. MolDesk performed better on showing less critical flags for the approved and progressed molecules.
They employed Rosetta to show that energy-based ranking of constructs is superior to homology based approaches.
In certain instances, the most successful humanized design achieved through experimentation originates from human framework genes that lack significant similarity to the corresponding animal antibodies. This observation underscores the potential efficacy of an energy-based humanization approach over the conventional homology-based humanization method.
They make all the V-J combinations that are free of liabilities such as n-glycosylation and extra cysteines, making a set of frameworks.
Mouse CDRs are grafted onto the artificial frameworks, modeled and energy minimized using Rosetta.
Geometric transformer that receives a single structure at input and annotates residues with likelihood that they are part of a binding site.
The geometric transformer only uses the atom names, no mass, charge etc.
Similar to convolution their geometric attention mechanism focuses on 8 nearest neighbors (3.2A) and then increases up to 64 nn (8.2A).
They use ~300,000 chains from the PDB for training (!). This is because they have extracted all the bio assemblies at 30% sequence identity.
They defined the most common atom names for all molecule types which gave them 79 elements. Interactions between these elements can be given as a 79x79 matrix.
Interaction cutoff is taken as 5A.
Pesto outperforms Scannet by a large margin, 0.93 vs 0.87 ROC AUC.
In some cases, processing MD trajectories of unbound proteins with PeSTo identifies certain interfaces better than when PeSTo is run on the starting static structure.
ProteinMPNN is a framework that receives a backbone and generates the most probable sequence that could fit it. This is firmly for protein design where you have a binding interface or a structure that you ‘need to fit’.
Using distances is better than using dihedral angles: This resulted in a sequence recovery increase from 41.2% (baseline model) to 49.0% (experiment 1), see Table 1 below; interatomic distances evidently provide a better inductive bias to capture interactions between residues than dihedral angles or N-Ca-C frame orientations
We found that training models on backbones to which Gaussian noise (std=0.02Å) had been added improved sequence recovery on confident protein structure models generated by AlphaFold (average pLDDT>80.0) from UniRef50, while the sequence recovery on unperturbed PDB structures decreased.
They employed a scaffold made by Rosetta, that was supposed to house a peptide recognizing some protein. The rosetta designs failed, but when they used ProteinMPNN to get sequences for it, they bound even better than the original bare-peptide.