Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • They introduce six molecular descriptors (MolDesk) that perform better than the therapeutic antibody profiler (TAP) and MOE.
    • The method used for calculating hydrophobicity and the specific scale employed can have a notable influence on the evaluation of antibody developability characteristics, as highlighted by Waibl et al. in 2022.
    • They use APBS (Adaptive Poisson-Boltzmann Solver) to calculate electrostatic potential of the molecules.
    • They calculate the values generated by a conformational ensemble using MD.
    • Their calculation of hydrophobicity showed sensitivity to the hydrophobicity definition.
    • Their measure of CDR negative charge outperforms other methods on viscosity experimental dataset.
    • They used a dataset of 64 antibodies with determined clearance and showed that hydrophobicity base descriptors do not discriminate between fast clearance and slow clearance whereas CDR_APBS_pos does.
    • They tested the effect of the model on the calculation of properties - they achieve broadly similar results.
    • After molecular dynamic simulations of the models, the results between the models are blurred, showing the importance of conformational sampling.
    • Their final set of metrics: “These descriptors include CDR_APBS_neg, a significant factor influencing viscosity and colloidal stability (Figure 3), CDR_APBS_pos, a major driver of PK clearance and polyspecificity (Figure 4), CDR_HPATCH_WW, which plays a key role in SEC, HMW, and HIC (Figure 5), CDR_HPATCH_BM which correlated the best with HIC, as well as total CDR length and APBS charge asymmetry (APBS_CAP) as two additional descriptors to align with the type of descriptors considered in the TAP metrics”
    • They compared TAP and MolDesk on the basis of whether molecules progressed or regressed in the tests. MolDesk performed better on showing less critical flags for the approved and progressed molecules.
    • They employed Rosetta to show that energy-based ranking of constructs is superior to homology based approaches.
    • In certain instances, the most successful humanized design achieved through experimentation originates from human framework genes that lack significant similarity to the corresponding animal antibodies. This observation underscores the potential efficacy of an energy-based humanization approach over the conventional homology-based humanization method.
    • They make all the V-J combinations that are free of liabilities such as n-glycosylation and extra cysteines, making a set of frameworks.
    • Mouse CDRs are grafted onto the artificial frameworks, modeled and energy minimized using Rosetta.
    • They create a tool to do back translation and provide single nucleotide frequencies (SNFs) for antibody variable regions from large-scale NGS study.
    • They use the Soto et al. dataset, which was ~326m antibody sequences.
    • They create position and gene specific scoring matrix (pgssm) using the gene and CDR-H3 length matches.
    • They can back-translate amino acid sequences to obtain plausible codons.
    • Their PGSSMVJ score is the average of single nucleotide frequencies.
    • Their naturalness score distinguishes between human and non-human sequences.
    • They benchmarked the back-translation on 181k sequences from GenBank.
    • Geometric transformer that receives a single structure at input and annotates residues with likelihood that they are part of a binding site.
    • The geometric transformer only uses the atom names, no mass, charge etc.
    • Similar to convolution their geometric attention mechanism focuses on 8 nearest neighbors (3.2A) and then increases up to 64 nn (8.2A).
    • They use ~300,000 chains from the PDB for training (!). This is because they have extracted all the bio assemblies at 30% sequence identity.
    • They defined the most common atom names for all molecule types which gave them 79 elements. Interactions between these elements can be given as a 79x79 matrix.
    • Interaction cutoff is taken as 5A.
    • Pesto outperforms Scannet by a large margin, 0.93 vs 0.87 ROC AUC.
    • In some cases, processing MD trajectories of unbound proteins with PeSTo identifies certain interfaces better than when PeSTo is run on the starting static structure.
    • They demonstrate that germline usage and distance from the germline are correlated with ADA.
    • They use OAS as a reference for natural NGS sequences, but they exclude studies that focus on single isotopes such as IgM, IgG.
    • They collected ADA points from therapeutics from IMGT and related these to FDA labels in the first instance and heterogenous search second.
    • If they had multiple values for ADA, they then take the maximum value.
    • There are 14 amino acid differences between the IGHV4-34*10 and IGHV4-34*11 alleles, a 14.4% sequence difference.
    • Healthy donors have similar gene usages as diseased ones.
    • They show that V gene usage has correlation with ADA just as the number of mutations.
    • ProteinMPNN is a framework that receives a backbone and generates the most probable sequence that could fit it. This is firmly for protein design where you have a binding interface or a structure that you ‘need to fit’.
    • Using distances is better than using dihedral angles: This resulted in a sequence recovery increase from 41.2% (baseline model) to 49.0% (experiment 1), see Table 1 below; interatomic distances evidently provide a better inductive bias to capture interactions between residues than dihedral angles or N-Ca-C frame orientations
    • We found that training models on backbones to which Gaussian noise (std=0.02Å) had been added improved sequence recovery on confident protein structure models generated by AlphaFold (average pLDDT>80.0) from UniRef50, while the sequence recovery on unperturbed PDB structures decreased.
    • They employed a scaffold made by Rosetta, that was supposed to house a peptide recognizing some protein. The rosetta designs failed, but when they used ProteinMPNN to get sequences for it, they bound even better than the original bare-peptide.
    • They created a siamese EGNN, one given WT the other one mutant, with their difference being the ddG prediction.
    • They used the AB-Bind dataset which consisted of 645 mutants from 29 complexes.
    • They created a set of non-redundant antibody-antigen binders with 1475 complexes. They imposed 70% clustering on antigens.
    • They mutated one complex per cluster and ran foldx resulting in 942,723 ddG foldX data points.
    • On ABBind dataset they achieve a pearson correlation of 0.8 - however when they impose stringent CDR cutoffs the correlation drops dramatically, indicating overtraining.
    • When they run the training on the synthetic dataset, it stops being sensitive to overtraining.
    • Using AF2 they developed a pipeline to fold and dock proteins simultaneously. The pipeline shows good performance in distinguishing interacting and non-interacting proteins.
    • Acceptable models are those with DockQ > 0.23. Success rate is defined as percentage of acceptable poses.
    • The best version of their model achieves a 39.4 success rate.
    • AlphaFold2 outperforms other docking methods.
    • Using the number of Cb in contact (within 8A) or plDDT of the interface results in ROC AUC in the region of .9 distinguishing interacting and non-interacting proteins.
    • As input they insert a chain break of 200 residues to model the interaction.
    • They note that it is very important to create the right MSAs for AF2.
    • As negative cases for interactions (non-interacting proteins) they employ data from Negatome.
    • They draw from the Masif method in that they define a triangular mesh. Each vertex is encoded with physicochemical information and then each patch of a defined radius is encoded numerically.
    • They teach overlapping patches to have similar embeddings as they are assumed to have overlapping functions as well.
    • They employed contrastive learning, annotating patches as positive if they were within 1.5A from centered vertices and negative if they were centered on vertex 5A away.
    • Their learned similarity distances cluster by curvature, hydropathy and charge.
    • They compared Surface ID to structure based similarity measurement approaches with SurfaceID performing slightly better.
    • They clustered the antibody-epitope patches simultaneously. It clustered the binding modes between HIV-1 GP120, two for influenza HA and one for SARS-CoV-2 RBD. Anti-ha clustered had same epitope but different paratopes showing that the algorithm distinguishes on that level
    • They proposed a design scheme for antibodies. Look for similar epitopes by surface id and use the antibodies as putative binders to the query.
    • Using a siamese network and Sabdab to predict antibody-antigen binding in a binary fashion.
    • They clustered the antigens at 0.9 sequence identity. They assumed that similar antibodies from the same antigen group bind in the same manner. This resulted in 3,892 antigen pairs.
    • They also created a dataset of covid specific antibodies with 9309 positive samples and 1710 negatives.
    • They used tha CKSAAP encoding, but compared against others such as one-hot, pssm or their-own trained word2vec.
    • They benchmark the different encodings and models to show that CKSAAP + CNN come out on top.
    • Their siamese CNN with CKSAAP achieve a staggering .85 PR AUC.