Demonstration that general purpose language models - like GPT3.5 - can reason about antibody - engineering tasks.
Authors explore the topic of in context learning - e.g. few shot learning where several examples are given and on the basis of that the model needs to provide a prediction for a new case.
They tested an array of general purpose models, such as GPTs, LLamas, Mistrals etc.
They tested on three antibody tasks - mouse/human discrimination, specificity prediction (from ngs) and isotype identification. In theory not that difficult tasks, but remember we are dealing witha general purpose language model.
They literally prompt the model with examples on, say mouse antibodies, human antibodies and provide a next one to predict.
They find that the predictions are not bad, especially in few shot scenario (16 examples or so).
In one test it even achieved accuracy on par with AntiBERTy.
Novel generative antibody method, CloneLM/CloneBO, following clonally plausible evolutionary paths.
They train CloneLM, an autoregressive language model, on antibody clonal family data from the OAS. There were two separate models for heavy and light sequences. They use FastBCR to call clonal families.
CloneLM generates new clonal families by conditioning on a given antibody sequence. They use a martingale posterior approach to ensure sampled sequences follow plausible evolutionary paths. So it takes antigen into account, but only by the virtue of the clonal family.
For benchmarking they train a language model oracle on a real human clonal family and use it as a simulated fitness function.
They further perform training on affinity and stability data to generate oracles for these and show that the newly generated sequences can be made to be more stable/have higher affinity.
Heavy light chain pairing has long been posited to be random, or at the very least VERY promiscuous. Authors check that via training their model on different portions of the variable region and showing that there is signal where full sequences are used.
Authors curated a set of ca. 233k positive heavy/light chain pairs from OAS. Negative samples were made by random shuffling - so they could occur in nature, just were not observed in this ds.
They use Antiberta2 as a basis for training the classification model.
The model achieves 0.75 and 0.66 ROC AUC on two test sets - so there seems to be some signal there.
When the model is split between lamdbas/kappas, it does better - though lambda have signal for kappas (remember that lambda is a rescue rearrangement for not-working kappa).
Naive B-cell pairs have less predictability than mature ones.
One of the first studies showing that introducing structure to protein language models, improves the predictive ability.
They fed ProteinMPNN (structural) inputs to ESM-1B to show that it improved recovery as opposed to using ESM-1B mask alone.
To marry ProteinMPNN and ESM-1B they use an ‘adapter’. Adapters in machine learning are lightweight modules that modify or extend a model’s functionality without retraining all parameters; in LM-DESIGN, a structural adapter integrates structural information into protein sequence predictions by bridging the structure encoder and a pretrained language model (pLM).
LM-DESIGN benchmarked against state-of-the-art protein inverse folding models, including ProteinMPNN, PiFold, GVP-Transformer, Structured Transformer, and GVP, while utilizing pretrained language models such as ESM-1b 650M and the ESM-2 series.
LM-DESIGN was evaluated on CATH 4.2 and CATH 4.3 datasets using sequence recovery rates and perplexity, compared against baselines.
LM-DESIGN outperformed individual models, improving sequence recovery by 4-12% points, surpassing ProteinMPNN and PiFold.
Method to employ low-N data for biologic engineering.
Assuming we have a dataset of ~100 affinity data points, we can choose (100 choose 2) pairs where we know which one has a larger readout than the other (e.g. stronger affinity) giving combinatorially larger amount of data points to train on.
The architecture used is CNN on top of a language model.
Benchmarked on three internal campaigns, Il6, EGFR and an undisclosed target.
New (old :) ) therapeutic antibody database, larger than what is available from other sources several times.
Includes over 2,900 investigational antibody candidates and more than 450 approved or late-stage molecules.
It tracks molecular format, target antigen, development status, clinical history, and company data, along with antibody isotype, conjugation status, and mechanism of action.
Analysis highlights a rise in bispecifics, ADCs, and immunoconjugates, with most clinical-stage antibodies targeting cancer and originating from China or the U.S.
The data are collected from public sources beyond INN lists, including company websites, press releases, clinical trial registries, regulatory agencies, and literature reports.
Architecturally, it is a mix of language models, diffusion and structure prediction methods.
Training happens by noising diffusion, firstly perturbing structure and making the model get it right and afterwards doing the same thing for sequences.
After these two steps the model is distilled into a consistency model. This results in a model that can get the final coordinates/sequence in a single step rather than iterative denoising.
Method achieves comparable accuracy to many methods out there, such as DiffAb, dyMEAN and others.
On docking, the best performance is in the order of 4A iRMSD when using an AlphaFold3 antibody model - so still some challenges remain.
Nice developability dataset with associated computational modeling.
A total of 334 antibodies were initially characterized, with a subset of 43 antibodies selected for in vivo pharmacokinetic (PK) assessment. These data points included high-throughput developability assays and various physicochemical measurements.
A multivariate regression model, using Partial Least Squares (PLS) regression, was developed. This model combined multiple in vitro measures (nonspecific interactions, self-association, and FcRn binding) to predict in vivo clearance, significantly improving PK correlation over individual assays.