Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • Authors perform simulations of structure to create a classifier of ASP/ASN degradation.
    • They use the Adimab database of 131 therapeutics where degradation rates were studied.
    • They look at three metrics: D1) backbone dihedral conformation of the n + 1 residue, (D2) side-chain dihedral conformation of Asn/Asp residue, (D3) fraction of time the Asn/Asp residue remains solvent accessible.
    • The combined model achieves accuracy of around ~.85
    • The best accuracy is achieved on the backbone (D1) model, indicating that this might be the most important descriptor.
    • Citing the Adimab study: for instance, there were 27 deamidation sites with the hotspot NG sequence in the complementary-determining region (CDR), of which only 14 underwent deamidation. A similar trend was observed in the case of isomerization (16 of 44 DG sites isomerized).
    • They study the ASN/ASP degradation (isomerisation and deamidation) by looking at the proton affinity
    • Backbone secondary structure, side-chain rotamer conformation and solvent accessibility were found to be key molecular indicators of Asp isomerization and Asn deamidation
    • They show that structurally clustering six CDRs allows for binning anti-COVID antibodies by their domain including grouping together antibodies from different lineages (clonotypes). The method offers a way of deeper characterization of convergent epitope responses as well as allowing for more targeted efforts in determination of novel structures maximally contributing to plugging the structural holes. The introduced structural clustering algorithm is SPACE.
    • Serum baiting - an extracellular coronavirus antigen is used to pan donated blood serum directly for complementary antibodies
    • They modeled and structurally clustered thousands of antibody Fv sequences in CoV-AbDab and show that 92% of multiple-occupancy structural clusters bin together antibodies that bind to consistent coronavirus antigens/domains, the antibodies within these structural clusters frequently transcend clonal lineages.
    • Employed homology modeling using AbodyBuilder.
    • The 2,063 full variable domain (Fv) sequences in CoV-AbDab were submitted to the ABodyBuilder antibody modeling tool. To ensure high model quality, only the 1,500 models for which ABodyBuilder used FREAD to homology model all six CDR loops were carried forward for structural clustering
    • SPACE: The antibodies are split by the six lengths of the CDRs. The score is calculated as the length-weighted sum of individual Ca CDR RMSDs.
    • For each length combination they pick a first CDR in the list. If the Score equals to <0.75A the new structure is added to the list, otherwise it is left for the next iteration. In this form it is a greedy algorithm.
    • Their lenient VH-clonotyping protocol groups Fvs with matching IGHV genes, the same length CDRH3, and ≥ 80% CDRH3 sequence identity. Their lenient Fv-clonotyping protocol additionally requires cluster members to have a matching IG[K/L]V gene, the same length CDRL3, and ≥ 80% CDRL3 sequence identity.
    • As a measure of whether an antibody binds to the same region they used the definition of ‘domain’ consistent.
    • A total of 184/200 (92%) of our multiple-occupancy structural clusters were domain-consistent, indicating that structural clustering with another member of CoV-AbDab is likely to be highly predictive of function.
    • A total of 88 (47.8%) contained at least one pair of antibodies from different lenient Fv clonotypes and 73 (39.7%) of the structural clusters contained at least two lenient VH-only clonotypes.
  • 2024-05-29

    Sequence-Based Nanobody-Antigen Binding Prediction

    • binding prediction
    • nanobodies
    • They introduce a classifier for specificity (binary) for antibody-antigen complexes.
    • They employ data from sdab-db, 47 antigens and 365 antibodies.
    • They extend the positive pairs set by assuming nanobodies would bind antigens with high sequence similarity.
    • They create a negative set by looking at similarity of antibodies and antigens and shuffling them if they are below similarity threshold to the complex.
    • They employ the k-mer gappe scheme as their embedding of choice.
    • They run a test on several embedding schemes versus several classification schemes (e.g. RF, SVM).
    • The best combination, k-mer gapped embedding and RF achieve accuracy ~90%.
    • They benchmark AF2 ability to improve docking of antibody-antigen complexes.
    • They check whether giving docked antibody-antigen to AF2 would improve the initial quality of the docking pose & whether it can be used for better rescoring.
    • No MSA is used, instead sidestepping it and just providing AF2 with sequence and ‘docked template’.
    • Side chains are stripped as it was found they provide too many constraints and AF2 is tasked in putting them back in place.
    • Four docking algorithms were used, Propose, ZDOCK, Piper and Cluspro (for processing PIPER results).
    • The docks are run in bound (231 complexes) and unbound state (25 complexes). The pulled apart bound complexes had side chains repackaged using SCWRL.
    • They find that AF2 retains ~50% of decoy contacts and moves the interface by ~1.24A, indicating that it does modify the input structure.
    • For rescoring they use AF2 composite score which is composed of pLDDT and pTMscore. However, both scores are given as z-scores normalized to each ab-ag complex - to allow for comparisons.
    • The rescoring using AF2composite helps in both bound and unbound docking cases, however much more in bound. But it deteriorates with model quality.
    • The least performance improvement for rescoring is noted for ClusPro.
    • So, AF2 can improve rescoring of docked poses by combining pLDDT and pTMscore, but the models need to be good to fully benefit from it.
    • Online predictor of antibody affinity.
    • Available at https://www.digitalgeneai.tech/solution/affinity
    • They report pearson correlation of 0.65
    • As a dataset they employ sdab-db (so nanobodies) and a dataset from Global Antibody Affinity Prediction Competition. For test set (Pierce lab, antibody benchmark). They appear to constrain their data to single chain antibodies, though the server advertises light chain use.
    • Their model is making embeddings from antibodies using AbLang, proteins using TAPE. The affinity is predicted from embeddings using ConvNeXt.
    • According to Fig 4, they achieve similar Pearson results on test and on train (~0.6), which is better than other methods they benchmark against (e.g. CSM-AB, ZRANK, PRODIGY etc.).
    • Intriguingly, when antibody features are removed, correlation remains at around ~0.5, drops to ~0.2 when antigen is removed.
  • 2024-05-15

    AbDiffuser

    • diffusion
    • antibody design
    • They introduce a diffusion model for antibodies and successfully test the designs experimentally.
    • They separate backbone first and sequence-structure co-design. When one performs backbone first, and then puts the sequence that could fit it, one might run into the issue of 1) no available sequence fitting the backbone well 2) missing the sequence-backbone dependencies that could be otherwise learned in end-to-end fashion.
    • They employ fixed length representation (2x149 residues) using the Aho scheme - that’s quite important and can be done because antibodies have a fairly conserved frame of reference.
    • To impose physical constraints they define idealized backbone reference they project onto as well as coarse-grained side chain representation that follows a similar principle.
    • Since they use fixed-length representation they can use priors on positional frequencies.
    • They train the network to reproduce the frequency of amino acids, from the paired sequences in OAS.
    • For structure generation, they compare against IgFold, noting that they receive very similar performance.
    • They train the network to reproduce the distribution of trastuzumab binding sequences (dataset from Mason et al. 2021). They train the generator model on the binders and a classifier on binders/nonbinders. They demonstrate that the AbDiffiser generator designs have higher probability than other methods (MEAN, RefineGNN) to produce binders, according to their classifier.
    • They selected 16 designs for experimental validation. In vitro validation resulted in 37% of constructs binding her2 successfully, with one slightly improved over trastuzumab. The improved one was 4 substitutions away from trastuzumab, so not an obvious distance away.
    • Software for predicting T-cell epitopes. The one relevant to antibodies is netMHC2pan - it will predict 15-mer epitopes for each of human MHCs, giving it a score whether it is a strong or weak binder as opposed to naturally occurring peptides.
    • MHC Class I (MHC-I) and MHC Class II (MHC-II). MHC-I predominantly present peptides derived from intracellular proteins, whereas MHC-II predominantly presents peptides from extracellular proteins
    • People use the binding affinity or eluted binding data.
    • Predictions can be done on either multi allelic or single allele binding.
    • The combined dataset used for training of NetMHCpan-4.1 consists of 13 245 212 data points covering 250 distinct MHC class I molecules, and the combined dataset used for training of NetMHCIIpan-4.0 consists of 4 086 230 data points covering a total of 116 distinct MHC class II molecules
    • The core improvement is integration of the NNALIgn_MA in netmhci/iiPAN
    • Antibody-less epitope prediction. Outperforms other methods by a large margin, is free to use and offers the training loop.
    • Approximately 90% of B cell epitopes are conformational epitopes
    • EpiCluster encodes the structure of antigen as EGNN and combines this embedding with ESM embedding. Epitopes as clustered using k-nearest neighbors.
    • They define contacts as any heavy atom within 4A
    • They used the graphein library to encode the antigen - edge is drawn between two residue nodes if they are within 5A of each other.
    • On a 2023 test set EpiCluster achieves PR-AUC of 0.72 with the closest contender BePro at 0.27
    • Using language models to predict polyreactivity.
    • Polyspecificity and polyreactivity are cognate, however the first is thought to be driven by factors such as overlapping epitopes whereas polyreactivity by excess charge or hydrophobicity.
    • Baculovirus particles assay (BVP) is often used to test polyreactivity. mAbs are added at high-concentrations to BVP coated plates.
    • They generated a dataset of polyreactive antibodies (~300 antibodies) that was heterogeneous in terms of antibodies/nanobodies, monospecific and formats.
    • They tested different concentrations (from 6.67nM to 667 nM) and well coating types (percentage BVP) - this was aimed at reducing noise from experimental conditions.
    • They tested two prediction modes, language models and structural descriptors. For language models, PROT5, ESM2 and Antiberty were used. Descriptors were calculated using Alphafold2-multimer. The language model predictions were superior to those calculated from AF2-multimer ones.
    • They introduced a set of single and double mutations based on most likely variants proposed by an ensemble of language models (ESMs). Most of the mutations not only didn’t remove binding ability, but actually improved it.
    • They performed evolution with the ESM-1b language model and the ESM-1v ensemble of five language models (six language models in total)
    • In the first round of evolution, they measured the antigen interaction strength by biolayer interferometry (BLI) of variants that contain only a single-residue substitution from wild-type.
    • In the second round, they measured variants containing combinations of substitutions, where we selected substitutions that corresponded to preserved or improved binding based on the results of the first round.
    • They performed these two rounds for all seven antibodies, measuring 8–14 variants per antibody in round one and 1–11 variants per antibody in round two
    • Across all seven antibodies, they found that 71–100% of the first-round Fab variants (containing a single-residue substitution) retained sub-micromolar binding to the antigen, and 14–71% percent of first-round variants led to improved binding affinity (defined as a 1.1-fold or higher improvement in Kd compared to wild-type)
    • Thirty-six out of all 76 language-model-recommended, single-residue substitutions (and 18 out of 32 substitutions that lead to improved affinity) occur in framework regions.
    • They found that Fabs for 21 out of the 31 language-model-recommended, affinity-enhancing variants that we tested had a higher melting temperature (Tm) than wild-type, and all variants maintained thermostability (Tm > 70 °C).
    • They tested for polyspecificity but there were no off the chart changes in the poly profile.
    • Five out of 32 affinity-enhancing substitutions (~16%) involve changing the wild-type residue to a rare or uncommon residue
    • Approach based on general protein language models consistently outperformed all baseline methods, including the antibody-specific ones (!).