An update on SPACE1, employing ABodyBuilder2. Better coverage of the structural method.
They used binders against coronavirus, ebola, lysozyme among others.
Structures are modeled using ABodyBuilder2. Structures are sorted by CDR lengths, frameworks aligned by Cas and RMSD calculated for CDR loops. Finally a clustering algorithm is used.
The clustering algorithms benchmarked were DBSCAN, OPTICS-xi, OPTICS-DBSCAN, K-means, Butina clustering, greedy clustering.
Two more variants were developed, SPACE2-HC, for heavy chains only as well as SPACE2-Paratope, for paratyping.
Two accuracy metrics were used, the fraction of epitope-consistent clusters (number of epitope-consistent multiple-occupancy clusters / number of multiple-occupancy clusters) and the fraction of clustered antibodies in epitope-consistent clusters (number of antibodies in epitope-consistent multiple-occupancy clusters / number of antibodies in multiple-occupancy clusters)
Two coverage metrics were used, the number of multiple-occupancy clusters and the number of antibodies in multiple-occupancy clusters were used. In order to examine accuracy and coverage with one measure they calculated the number of antibodies in consistent multiple-occupancy clusters
They selected agglomerative clustering as best, though it is not better than Optics-XI, but it was providing larger clusters.
Space2 using all loops was better than SPACE2-HC or SPACE2-paratope
Space2 improves the coverage over SPACE1, thanks to the ABodyBuilder2 protocol.
Space2 increases coverage with respect to just clonotyping, but clonotyping remains much more accurate.
ClusPro server with the AbeMap module for epitope mapping. It employs homology modeling if antibody structure is unavailable and makes it possible to predict epitopes by ranking the most common contacting residues in its docking poses.
For epitope prediction, the 1,000 structures are used to calculate the frequency of each antigen surface atom’s occurrence in the antibody–antigen interface. To map an epitope, AbEMap defines the atomic epitope likelihood score as the Boltzmann weighted atomic interface occurrence frequency averaged over the ensemble of antibody structures.
If the structure of the antibody is not known, the structure is modeled using homology methods, with the completion by MODELLER
They count contact as 5Å away.
The epitope frequency/energy scores are calculated for each atom.
To assess the precision of epitope prediction, they transform atom likelihoods into residue likelihoods through the summation of atomic contributions attributed to each residue. While incorporating atomic likelihood values implies that larger residues with a greater number of surface-accessible atoms receive higher scores, it's important to note that the residue likelihood values remain unadjusted for size, and therefore, users may need to address this potential bias.
AbEmap gets F1 ~.2 for the top 10 residues ranked by scores.
Antibody diversity treatise, arguing that repertoire cannot possibly be ‘that big’, rather there is some, yet unknown commonality across independent repertoires.
Human body has 10^11 B-cells.
B-cells are produced at a rate of 10^9 per day but majority are removed due to self-reactivity etc.
Naive B-cells are estimated at 10^15.
The number of pathogenic species thought to be infectious for humans has been estimated at ~1400
It would be not feasible for an organism to go through 10^15 possible antibodies in mounting an immune response.
Author suggests that the antibody repertoire is highly redundant.
Author suggests that the N individuals have different but significantly overlapping fractions, M1-n
Author suggests that one should identify convergent motifs responsible for responses.
They benchmarked a range of experimental and computational measures to see which ones correlated with therapeutics moving it through the clinical trials.
Table 2 - cheat sheet of experimental methods and what they do.
Some of the experimental results are highly correlated with one another (The retention time on the FcRn column was highly correlated with the affinity-capture self-interaction nanoparticle spectroscopy (AC-SINS), polyspecificity reagent (PSR), clone self-interaction using bio-layer interferometry (CSI) and cross-interaction chromatography (CIC) assays, which constituted one of the polyspecificity clusters in our prior work)
For each experimental assay they update the 90% intervals wrt to their previous experimental recommendations.
Table 4 shows the descriptors that need to be calculated for developability assessments.
Some of the in silico metrics also correlate with one another, forming conceptual groups (e.g. charge calculations)
They notice that there is a slight trend for mabs that progress in trials to have less violations of their experimental descriptors than those that were regressed.
Authors perform simulations of structure to create a classifier of ASP/ASN degradation.
They use the Adimab database of 131 therapeutics where degradation rates were studied.
They look at three metrics: D1) backbone dihedral conformation of the n + 1 residue, (D2) side-chain dihedral conformation of Asn/Asp residue, (D3) fraction of time the Asn/Asp residue remains solvent accessible.
The combined model achieves accuracy of around ~.85
The best accuracy is achieved on the backbone (D1) model, indicating that this might be the most important descriptor.
Citing the Adimab study: for instance, there were 27 deamidation sites with the hotspot NG sequence in the complementary-determining region (CDR), of which only 14 underwent deamidation. A similar trend was observed in the case of isomerization (16 of 44 DG sites isomerized).
They study the ASN/ASP degradation (isomerisation and deamidation) by looking at the proton affinity
Backbone secondary structure, side-chain rotamer conformation and solvent accessibility were found to be key molecular indicators of Asp isomerization and Asn deamidation
They show that structurally clustering six CDRs allows for binning anti-COVID antibodies by their domain including grouping together antibodies from different lineages (clonotypes). The method offers a way of deeper characterization of convergent epitope responses as well as allowing for more targeted efforts in determination of novel structures maximally contributing to plugging the structural holes. The introduced structural clustering algorithm is SPACE.
Serum baiting - an extracellular coronavirus antigen is used to pan donated blood serum directly for complementary antibodies
They modeled and structurally clustered thousands of antibody Fv sequences in CoV-AbDab and show that 92% of multiple-occupancy structural clusters bin together antibodies that bind to consistent coronavirus antigens/domains, the antibodies within these structural clusters frequently transcend clonal lineages.
Employed homology modeling using AbodyBuilder.
The 2,063 full variable domain (Fv) sequences in CoV-AbDab were submitted to the ABodyBuilder antibody modeling tool. To ensure high model quality, only the 1,500 models for which ABodyBuilder used FREAD to homology model all six CDR loops were carried forward for structural clustering
SPACE: The antibodies are split by the six lengths of the CDRs. The score is calculated as the length-weighted sum of individual Ca CDR RMSDs.
For each length combination they pick a first CDR in the list. If the Score equals to <0.75A the new structure is added to the list, otherwise it is left for the next iteration. In this form it is a greedy algorithm.
Their lenient VH-clonotyping protocol groups Fvs with matching IGHV genes, the same length CDRH3, and ≥ 80% CDRH3 sequence identity. Their lenient Fv-clonotyping protocol additionally requires cluster members to have a matching IG[K/L]V gene, the same length CDRL3, and ≥ 80% CDRL3 sequence identity.
As a measure of whether an antibody binds to the same region they used the definition of ‘domain’ consistent.
A total of 184/200 (92%) of our multiple-occupancy structural clusters were domain-consistent, indicating that structural clustering with another member of CoV-AbDab is likely to be highly predictive of function.
A total of 88 (47.8%) contained at least one pair of antibodies from different lenient Fv clonotypes and 73 (39.7%) of the structural clusters contained at least two lenient VH-only clonotypes.
They introduce a classifier for specificity (binary) for antibody-antigen complexes.
They employ data from sdab-db, 47 antigens and 365 antibodies.
They extend the positive pairs set by assuming nanobodies would bind antigens with high sequence similarity.
They create a negative set by looking at similarity of antibodies and antigens and shuffling them if they are below similarity threshold to the complex.
They employ the k-mer gappe scheme as their embedding of choice.
They run a test on several embedding schemes versus several classification schemes (e.g. RF, SVM).
The best combination, k-mer gapped embedding and RF achieve accuracy ~90%.
They benchmark AF2 ability to improve docking of antibody-antigen complexes.
They check whether giving docked antibody-antigen to AF2 would improve the initial quality of the docking pose & whether it can be used for better rescoring.
No MSA is used, instead sidestepping it and just providing AF2 with sequence and ‘docked template’.
Side chains are stripped as it was found they provide too many constraints and AF2 is tasked in putting them back in place.
Four docking algorithms were used, Propose, ZDOCK, Piper and Cluspro (for processing PIPER results).
The docks are run in bound (231 complexes) and unbound state (25 complexes). The pulled apart bound complexes had side chains repackaged using SCWRL.
They find that AF2 retains ~50% of decoy contacts and moves the interface by ~1.24A, indicating that it does modify the input structure.
For rescoring they use AF2 composite score which is composed of pLDDT and pTMscore. However, both scores are given as z-scores normalized to each ab-ag complex - to allow for comparisons.
The rescoring using AF2composite helps in both bound and unbound docking cases, however much more in bound. But it deteriorates with model quality.
The least performance improvement for rescoring is noted for ClusPro.
So, AF2 can improve rescoring of docked poses by combining pLDDT and pTMscore, but the models need to be good to fully benefit from it.
Available at https://www.digitalgeneai.tech/solution/affinity
They report pearson correlation of 0.65
As a dataset they employ sdab-db (so nanobodies) and a dataset from Global Antibody Affinity Prediction Competition. For test set (Pierce lab, antibody benchmark). They appear to constrain their data to single chain antibodies, though the server advertises light chain use.
Their model is making embeddings from antibodies using AbLang, proteins using TAPE. The affinity is predicted from embeddings using ConvNeXt.
According to Fig 4, they achieve similar Pearson results on test and on train (~0.6), which is better than other methods they benchmark against (e.g. CSM-AB, ZRANK, PRODIGY etc.).
Intriguingly, when antibody features are removed, correlation remains at around ~0.5, drops to ~0.2 when antigen is removed.
They introduce a diffusion model for antibodies and successfully test the designs experimentally.
They separate backbone first and sequence-structure co-design. When one performs backbone first, and then puts the sequence that could fit it, one might run into the issue of 1) no available sequence fitting the backbone well 2) missing the sequence-backbone dependencies that could be otherwise learned in end-to-end fashion.
They employ fixed length representation (2x149 residues) using the Aho scheme - that’s quite important and can be done because antibodies have a fairly conserved frame of reference.
To impose physical constraints they define idealized backbone reference they project onto as well as coarse-grained side chain representation that follows a similar principle.
Since they use fixed-length representation they can use priors on positional frequencies.
They train the network to reproduce the frequency of amino acids, from the paired sequences in OAS.
For structure generation, they compare against IgFold, noting that they receive very similar performance.
They train the network to reproduce the distribution of trastuzumab binding sequences (dataset from Mason et al. 2021). They train the generator model on the binders and a classifier on binders/nonbinders. They demonstrate that the AbDiffiser generator designs have higher probability than other methods (MEAN, RefineGNN) to produce binders, according to their classifier.
They selected 16 designs for experimental validation. In vitro validation resulted in 37% of constructs binding her2 successfully, with one slightly improved over trastuzumab. The improved one was 4 substitutions away from trastuzumab, so not an obvious distance away.