Computational Antibody Papers

Filter by tags
All
Filter by published year
All
TitleKey points
    • CALM, a "sequence-native" foundation model that maps antibody and antigen primary sequences without requiring structural inference.
    • CALM employs modality-specific encoders (AntiBERTy for antibodies, ESM-2 for antigens) to align cognate pairs in a shared embedding space using cosine similarity.
    • Authors evaluate performance by the model's ability to pick the correct partner from a candidate pool in both directions ab->ag, ag->ab.
    • Calm uses optional structural masks to restrict inputs to paratope and epitope residues, which significantly reduces sequence noise and improves accuracy (but clearly needs a structure).
    • CALM achieves Top-1 of 2% in strict out-of-distribution tests, representing a 3x to 46x improvement over random baselines despite a low-data regime.
    • They lay out an autoregressive decoder for de novo design, though this generative component was not trained or tested in this study.
    • The authors evaluated AlphaFold3, Boltz-2, and Chai-1 on their ability to distinguish cognate (correct) nanobody-antigen pairs from incorrect, non-binding pairings.
    • They used 106 experimental complexes and generated a combinatorial matrix of 11,132 shuffled non-cognate pairings to serve as ground-truth "incorrect" decoys.
    • Internal confidence scores (specifically ipTM) were very weakly predictive of true binding. In terms of Average Precision (PR-AUC), AF3 performed best, followed by Chai-1 and then Boltz-2.
    • Increased sampling improves structural geometry but does not help models "select" the correct binder. Most quality gains occur within 10–25 samples; deeper sampling primarily increases the number of plausible-looking false positives.
    • Novel training scheme for antibody language models, modeling phylogenetic relationships rather than pure mutational MLM - called DASM.
    • Unlike AbLang2’s standard masked language modeling , DASM uses a mutation-selection framework that factors out nucleotide-level biases (like the codon table and SHM rates) to isolate purely functional selection effects.
    • The model was trained on approximately 2 million parent-child sequence pairs derived from reconstructed B cell phylogenies , using datasets such as JaffePaired, Tang, and Vanwinkle.
    • Model is a compact 4-million-parameter Transformer-encoder featuring 5 layers, 8 attention heads , and a custom "wiggle" activation function to stabilize output selection factors.
    • DASM was validated on the FLAb collection (Koenig and Shanehsazzadeh datasets) and MAGMA-seq high-throughput binding assays for influenza and SARS-CoV-2 antibodies. It was better than ABlang2, progen2 and esm2.
    • Authors propose a new method to train a nanobody structure predictor by using ‘blueprints’.
    • They developed a classifier (NbFrame) to identify whether the HCDR3 loop adopts a kinked (framework-contacting) or extended (solvent-exposed) conformation. This allows the model to use sequence-encoded priors and explicit constraints during the folding process.
    • The model itself is very lightweight and runs significantly faster than "heavy" models like AF or Boltz. NbForge achieves sub-second inference speeds, predicting structures in less than a second on both CPU and GPU. In comparison, models like AlphaFold3 or Boltz1 typically require tens of seconds to minutes per structure - MSA is a different story altogether.
    • The model matches the HCDR3 prediction quality of heavy models while being much more efficient. While AF3 and Boltz1 are more accurate at modeling the rigid framework, NbForge achieves parity in the hypervariable HCDR3 region—the part most critical for binding. Its speed and high recovery of disulphide bonds make it ideal for triaging millions of candidates in large-scale discovery campaigns.
    • Characterization of binding hot spots on 50 high-resolution antibody-antigen complexes from the ABAG-Docking benchmark.
    • FTMap Algorithm: FTMap identifies these spots by docking 16 small organic probes using a Fast Fourier Transform (FFT) approach. It clusters the best poses and identifies ‘consensus sites’ where multiple probe types overlap, indicating regions that contribute disproportionately to binding energy.
    • Aromatic residues on the paratope drive hot spot formation, particularly Trp, Tyr, and His, along with Phe. Trp and Tyr are especially critical on both sides of the interface due to their combined hydrophobic and polar (amphiphilic) character.
    • Hot spots are more concentrated on the paratope than the epitope, supporting the idea that antibodies primarily drive these interactions
    • Results of the Gingko developability competition.
    • Benchmarked 113 teams on predicting five key developability traits: hydrophobicity, thermostability, self-association, expression titer, and polyreactivity.
    • Models were trained on the GDPa1 dataset (246 antibodies) and blindly tested on GDPa3 (80 diverse antibodies from OAS).
    • While cross-validation (CV) results were promising, performance plummeted on the test set e.g., self-association dropped from a 0.653 CV Spearman's rho to 0.356.
    • Hydrophobicity was the most predictable (rho = 0.708), while expression titer was the most challenging (rho = 0.310).
    • Winning models varied by assay; for example, team AbDevelop won for self-interaction, while microcrisprtm2 led in thermostability.
    • An update to the “Baselning the buzz” paper.
    • Previously authors got a huge dataset of 500,000 anti-Her2 trastuzumab CDR-H3 highbinders/weakbinders/nonbinders.
    • Here, authors tested 140 designs of Trastuzumab H3, from Ablang, ProteinMPNN ESM2 and the good old Blosum. These were filtered among others using the CNN predictor
    • Blosum does the best as judged by SPR of the designs, but authors note that its designs have the biggest overlap with the training set.
    • In contrast, more complex methods like AbLang, ESM-2, and ProteinMPNN were found to explore different, more diverse areas of the sequence space. This means they generated sequences that were more "distal" (further away) from the original DMS-informed data.
    • ‘Evolution’ of AlphaFold from Isomorphic labs.
    • IsoDDE achieves 39% accuracy in high fidelity regime ((DockQ > 0.8)) which corresponds to near-experimental precision with an interface RMSD (iRMSD) typically below 1.0Å. That’s a 2.3x improvement over AF3.
    • Using a single model seed, IsoDDE successfully predicts 63% of interfaces DockQ > 0.23 (correlating to an iRMSD$ of roughly 4.0Å or less, which is a 1.4x improvement over AF3's single-seed performance.
    • IsoDDE accurately models the backbone of the highly variable CDR-H3 loop for 70% of antibodies (<2Å) in the test set, outperforming AF3’s success rate of 58% 1.2x.
    • When scaled to 1,000 seeds, IsoDDE reaches an 82% success rate for correct interfaces and 59% for high-accuracy predictions. So to get results one cannot exactly do it on a laptop.
    • It is a technical report, architecture is not discussed.
    • New open source reproduction of AlphaFold3 that either matches or surpasses it.
    • IntelliFold-2-Pro achieves a success rate of 58.2% (DockQ > 0.23 so about 4A irmsd) on antibody-antigen interactions, outperforming AlphaFold 3's 47.9%.
    • For small molecule co-folding, IntelliFold-2-Pro reaches 67.7%, surpassing AlphaFold 3’s 64.9%.
    • Interface Precision vs. Monomers: IntelliFold-2 shows marginal gains in protein monomer accuracy (LDDT of 0.89 vs AF3's 0.88).
    • A strategy for layer-wise selective fine-tuning of general protein language models.
    • Instead of full fine-tuning, they found that adapting only the first 50-75% of layers via LoRA provides optimal performance while saving computational costs.
    • For example, they perform sequence-specific "test-time" training where they optimize the model using a Masked Language Modeling (MLM) objective on the target sequence itself before predicting its properties. This approach led to a 18.4% accuracy boost in predicting the notoriously difficult CDR-H3 antibody loop