    • They developed a language model to predict protein protein interactions from sequence on the basis of a large language model.
    • They Train protBERT to predict PPI.
    • They use the BIOGRID dataset, where interactors are mapped if they are confirmed by two independent sources, such as two independent experimental techniques in two separate studies. In total they have 179,018 positive pairs.
    • They use negatome 2.0 as a negative dataset. It relies on various sources such as manual curation from the literature or subunits from the PDB that do not interact with each other. Total of 3,958 pairs were used.
    • They use ProtBERT-BFD to pretrain the model.
    • They mapped each protein pair as [CLS] Protein A [SEP] Protein B [SEP], mapping the final output to binary.
    • They achieve 92% accuracy on the test set.
    • They also perform well on annotating negative binders as coming from different subcellular compartments. On positive samples in this dataset, the model was 85% accurate. On negative samples, SYNTERACT was only 38% accurate, classifying many negatives from subcellular compartment sampling as interactors.
    • Proposing a Bayesian scheme to optimally select generated antibodies from a previously introduced language model (GLM).
    • They use the 1B GLM-AB model from BioMap. Training involves variation on MLM that masks entire spans of sequence.
    • The entire point is how to ‘select’ better antibodies according to some unknown ‘fitness function’. If you only get a few experimental data points at a time to evaluate f, you’d better make them count. Their combination of bayesian scheme and a language model optimizes how the ‘next generated sequence points’ are picked so that best approximation to f is reached.
    • They use (computational simulation) Absolut! framework rather than wet lab data.
    • Demonstration that training the transformers on paired antibody data provides improvements over single-chain models. They created two models - one trained on single chains from Jafffe, the other with the paired information.
    • when comparing embeddings, they only extracted one sequence from the paired transformer to make comparison with the single sequence transformer sound.
    • They showed what happens when one performs UMAP of the light chains of the paired and unpaired transformer. The unpaired transformer produces more random dispersal, whereas the paired variety performs much tighter clustering, similar to heavy chains. Performance on heavy chain clustering is similar.
    • They asked for prediction of masked positions in heavy chains when the chain is paired with the native mutated variety versus back-mutated germline one. The cross entropy loss was much better when the prediction was made in the presence of the native mutated light chain.
    • They contrasted their paired model with ESM2 (650M). They fine tuned ESM2 on the paired data. They averaged all attention heads over all layers to get a single score. The fine-tuned ESM2 attends to conserved Cysteines and CDR regions whereas EMS2 does not, focusing more on linear stretches
    • Architecture was ROBERTA, 24 layers, 16 attention heads, embedding 1024, feed forward 4096.
    • trained using MLM for 100 epochs.
    • Introducing ESM-2 and ESMFold. Scaling transformer model parameter size to 15B allows for more precise predictions of structures.
    • They make available an atlas of 617 million predicted structures
    • Learning objective is MLM, masking 15% of protein input.
    • Perplexity ranges from 1 for a perfect model to 20 for a model that makes predictions at random. Intuitively, perplexity describes the number of amino acids the model is uncertain between when it makes a prediction
    • After 270k training steps the 8M parameter model has a perplexity of 10.45, and the 15B model reaches a perplexity of 6.37.
    • The 15B model achieves best perplexity and structural modeling accuracy.
    • For some structures, accuracy of structure prediction jumps from 7.7A at 8m parameters to 7.0A at 35m parameters and to 3.2A at 15m parameters. The 3b model brings it down to 2.8 and 15B model to 2.6. For other structures, good prediction is only achieved at 15B
    • Their structure predictor closely follows AlphaFold2, but instead of evoformer, they use the representation from the ESM-2.
  • 2024-03-13

    Language models enable zero-shot prediction of the effects of mutations on protein function

    • non-antibody stuff
    • language models
    • experimental techniques
    • They contrast ESM to some other language models and show that in zero shot fashion some correlations can be made with experimental measurements of variants.
    • They compare performance of ESM and DeepScan on 41 deep mutational scanning datasets collated in a single paper. They claim ESM has better overall correlations but it is not crystal clear from the graph and by their own admission by paired t-test.
    • They find that pretraining the data on Uniref30 gives worst performance. An ok performance is given for Uniref50 or Uniref70 with a dip again at Uniref100.
    • Binding sites have much higher conservation.
    • Core of the protein also appears to have lower conservation.
    • 100B parameter protein model fine-tuned and a 1B antibody-specific model.
    • For PLM training they employ data from Uniref90 and ColabFold. After filtering and deduplication they are left with approximately 350m sequences, or 100B tokens.
    • On proteins, xTrimoPGLM-100B outperforms ESM2-15B on 12 of 15 downstream tasks (e.g. thermostability, structure prediction etc.).
    • They train a 1B protein model and then fine tune it on antibodies from OAS
    • Their masking procedure includes span masking not only several residues at a time.
    • They use 678m OAS sequences.
    • They benchmarked the antibody model on naturalness and antibody structure prediction and the Xtrimo-pglm-oas outperformed ESMFold, ALphafold2 and IgFold.
    • They developed and benchmarked a suite of language models for proteins available in the Progen2 suite. Zero shot fitness predictions of antibody specific models did not provide better results.
    • Authors stipulate that data that should be fed to the models should be carefully selected, not simply provided in the raw format.
    • Learning objective is autoregressive, predicting the next token.
    • The family is trained on 151M, 764M, 2.7B, and 6.4B parameters
    • They clustered OAS at 85% sequence identity using linclust yielding ~554m sequences.
    • Each sequence is then provided as-is and flipped.
    • DMS studies: We collected expression and antigen-binding enrichment measurements for variants of the anti-VEGF g6 antibody from a DMS study (Koenig et al., 2017). From a second DMS study, we collected binding enrichment measurements for variants of the d44 anti-lysozyme antibody (Warszawski et al., 2019). Binding affinity (KD) and thermal stability measurements (TM) for the remaining six antibodies (C143, MEDI8852UCA, MEDI8852, REGN10987, S309, and mAb114) were drawn from a recent study on antibody affinity maturation using pretrained language models (Hie et al., 2022).
    • The larger the number of parameters, the lower the perplexity on the hold out test set.
    • Testing on antibody binding, expression and melting temperature, the OAS-trained model performs worse than the generalistic models.
    • Diffusion-based antibody-antigen binding site structural co-design
    • Sampling of antibody sequence and structure directly conditional on the antigen structure.
    • Model receives antigen structure and antibody framework in complex. Then CDRs are randomly initialized with AA types, orientations and positions.
    • The advantage over GANs and VAEs should be that it generates candidates iteratively so filters can be applied on the fly to the sampling process.
    • Diffusion probabilistic models learn to generate data via denoising samples from a prior distribution
    • They predict the amino acid type, ca coordinate and orientation in SO(3)
    • In addition to the joint design of sequences and structures, we can constrain partial states for other design tasks. For example, by fixing the backbone structure (positions and orientations) and sampling only sequences, we can do fix-backbone sequence design.
    • We cluster antibodies in the database according to CDR-H3 sequences at 50% sequence identity.
    • RMSD: is the Cα root-mean-square deviation (RMSD) between the generated structure and the original structure with only antibody frameworks aligned - however here higher RMSD means that the generated structure is more diverse.
    • However they also checked how accurate they are in RMSD when they fix sequences (so structure gets modified). Here for H3 they achieve 3.246 A.
    • AAR: is the amino acid recovery rate measured by the sequence identity between the reference CDR sequences and the generated sequences
    • They compared to RosettaAntibodyDesign by IMP (percentage of CDRs with better energy than the original cdr, AAR and Ca RMSD)
    • They optimize the antibody by perturbing it for several steps (forward diffusion) and then denoise it (going backwards) to find antibodies with better IMP but they also look at RMSD and Seq id.
    • Review on developability methods, very good reference for what are the problems afflicting antibodies, what assays are used and it gives perspective where computational might have an impact.
    • Specific binding high on-target binding and low off-target and non-specific binding is important to reduce the risks of abnormal pharmacokinetics and fast antibody clearance.
    • Polyspecificity. For nonspecific binding ELISA can be used to check non-specific binding for non-targets. Polyspecificify particle assay (PSP) checks binding to complex antigen mixtures. Polyspecificity reagent (PSR) is similar to PSP. In Cross interaction Chromatography (CIC) non-specific protein interactions, such as monoclonal antibodies interacting with immobilized polyclonal antibodies, are detected via their relative retention times. Standup monolayer chromatography (SMAC), instead detects non-specific interactions between monoclonal antibodies and the column.
    • Colloidal stability, self-association. Self interaction Chromatography (SIC). AC-SINS, affinity-capture self-interaction nanoparticle spectroscopy and charge-stabilized self-interaction nanoparticle spectroscopy (CS-SINS). Also HIC.
    • Folding stability. Differential scanning calorimetry or differential scanning fluorimetry.
    • Ideally antibodies would have a shelf-life of several years which requires stability engineering.
    • Assays can be performed in formulation (pH 6, 10 mM histidine) or physiological conditions (pH 7.4, phosphate-buffered saline).
    • Generally, the isoelectric point of therapeutic antibodies is between 6 and 9. However, various developability challenges have been reported for some antibodies with relatively low (pI <6.5-7) or high (pI >8.5-9) isoelectric points
    • Computational assays can include: naturalness prediction, MHC class II, ptm liabilities, isoelectric point (pI), charge, hydrophobic imbalance, surface areas buried at the VH-VL interface along with molecular surface patches
    • Antibodies are flexible & crystallization might not reflect well the actual dominant structure adopted.
    • T-cell epitope assay: This may be addressed in vitro by the use of immune cell activation assays, where pooled peripheral blood mononuclear cells are exposed to candidate biologics to reveal the presence of activating T cell epitopes.
    • Antibodies have quite a long half-life (3 weeks) because they can engage the FcRn receptor which rescues the ligands from cellular recycling. The efficiency by which different biologics undergo this process has an enormous impact on their pharmacokinetic properties and biodistribution
    • There are some raging differences between humans and mice for the animal to be used as a model organism:
    • While being a potent vascular endothelial growth factor (VEGF)-blocker in humans, the widely used anti-VEGF human IgG1 bevacizumab is unable to block mouse VEGF, implying that mice could not have been used in its development.
    • Our understanding of FcRn biology has revealed major differences that must be taken into consideration when conventional mice are used. This is due to large differences in ligand binding to mouse and human FcRn, where mouse IgG binds very weakly to the human form, and human IgG binds stronger to mouse FcRn than to the human counterpart.
    • Most antibodies are IV, but there are some experiments with abs targeting infections in the GI tract and these are oral.
    • Creating a nativeness score for humans and VHHs using a variant of variational auto-encoder (VAE).
    • They used approximately 2m sequences for heavy, kappa, lambda and nanobody each.
    • Model is VQ-VAE trained on masked language modeling objective with VAE-specific terms incorporated in the loss function.
    • The nativeness definition is a transformation of the |x_r - x_n|, that is the MSE of the original and reconstructed sequence.
    • They used the data from the ‘universal Nb framework’ paper to perform VHH grafting experiments.
    • They tested against other methods whether they could predict human vs non-human sequence (humanized, chimeric, mouse), they are the best with pr AUC of 975. Closest was Oasis and Germline content with pr AUC of .963
    • ADA on 126 therapeutics shows r2 of .25.
    • As a nanobody humanization case-study they employed antibodies from another paper that offer WT and humanized variants. They show that their score moves the humanized variants closer to the human distribution - but human and VHH humannesses are still far separated.
    • When they have species-matched predictions, they call it nativeness. If they score VHH on human models, they call it humanness.