A Unified Framework for Unsupervised AIRR Analytics
immuneML introduces the first standardized environment to discover patterns, cluster sequences, and run robust stability validations on partially or imperfectly labeled adaptive immune receptor data.
The platform systematically evaluates and compares generative machine learning models (such as LSTM and VAE) to determine how effectively they can engineer novel, antigen-specific immune sequences versus simply memorizing training data.
It rigorously assesses how well different data representations, including advanced protein language models, capture true biological properties like epitope specificity and MHC restrictions, a utility proven on 48,000 experimental TCRβ sequences.
It provides vital exploratory and dimensionality reduction tools to identify sequencing batch effects and data biases before running supervised diagnostics, demonstrated using a real-world single-cell dataset from 143 inflammatory bowel disease patients.
Novel protein design model, revisiting the SE(3)architecture.
Genie 3 is an all-atom, SE(3)-equivariant structure diffusion model that treats proteins as branched polymers to capture sidechain details, utilizing a Latent Transformer with bidirectional layer updates and an Invariant Point Attention structural decoder.
The authors did not test the model on therapeutic formats such as antibodies or nanobodies; instead, they focused entirely on generating generic de novo protein binders, unconditional monomers, and functional motif scaffolds.
The method was computationally benchmarked using self-consistency pipelines (ProteinMPNN/ESMFold), MotifBench for functional sites, and a strict AF2M+ binder interface metric, alongside real-world experimental validation that yielded a 12.5% hit rate against the Nipah virus Glycoprotein G
Method to select mutants computationally for lab testing.
"Stochastic beam search," a sequence-centric method that evaluates masked language models (MLMs) via pseudo-log-likelihood, rather than using costly mutation-centric approaches.
This technique is computationally efficient and produces higher-quality sequences by better balancing likelihood and diversity.
The method was extensively validated through both in silico evaluations across various models and direct head-to-head in vitro antibody campaigns.
In wet-lab testing, the optimized models effectively screened for synthesizability and binding, with supervised guidance achieving a 100% success rate in the experiments
Implementation adjustments to Boltz-2 that make it run much faster.
One of the biggest hurdles of running methods that use MSAs are MSA servers. They are computationally expensive and difficult to set up.
MSA carries a lot of predictive value so skipping this step is unwise.
Integrates MMseqs2-GPU directly into the Boltz-2 pipeline, removing the primary CPU bottleneck and enabling high-throughput, local structure prediction.
This implementation streamlines the MSA process, making the predictions order of magnitude faster.
PromptMOL: a PyMOL plugin that enables direct interaction with molecular structures using natural language commands.
While PyMOL is a gold standard in structural biology, its interface can be complex; PromptMOL removes the barrier to entry by replacing convoluted script syntax with simple, descriptive prompts.
By hooking directly into LLMs (via local models like LM Studio, or cloud providers like OpenAI and Anthropic), the plugin intelligently handles selections, coloring, structural analysis, and rendering tasks on the fly.
Case study & framework how to tie together available computational annotators to perform cross reactivity optimization for a VHH.
It replaces inefficient, sequential screening pipelines with a multi-objective Bayesian optimization loop. It uses a Gaussian process surrogate model coupled with a genetic algorithm to navigate complex sequence spaces and identify Pareto-optimal candidates.
The framework is model-agnostic; users must provide and validate the in silico "oracles" (predictive models) relevant to their specific optimization goals. Objectives are defined by selecting and potentially weighting these interchangeable scoring functions.
The authors rigorously benchmarked BOAT against standard genetic algorithms and generative baselines (like LaMBO-2). Testing relied on computational benchmarks, including comparing results against exhaustive "ground truth" Pareto fronts in limited search spaces.
The study did not perform wetlab validation. Because the framework relies entirely on in silico oracles as proxies, the final experimental success of the optimized candidates is ultimately tied to the predictive quality of the models the user selects.
Case study application of generation of novel HER2 binders using the Herceptin template, with five specific computational properties (HER2 specificity, FvNetCharge, FvCSP, HISum, and MHC II minPR) encoded as constraints.
The authors train a conditional CDRH3 GPT (based on a mini GPT-2 architecture) using large-scale sequences sourced from the OAS database.
Sequences are computationally annotated with property labels and refined via reinforcement learning (RL) to satisfy multi-property constraints.
Target-specific binding predictors (oracles) are used to guide the RL process to generate CDRH3 sequences that exhibit HER2-targeting capabilities similar to Herceptin.
Wet-lab validation confirms HER2-binding affinity and tumoricidal efficacy; while physical developability assays were not performed in the lab, these traits were primary objectives of the computational design stage.
Protein design model applied to antibodies and lab-tested.
Protenix-v2 is an integrated biomolecular modeling system that enables high-accuracy structure prediction, zero-shot generative binder design, and improved ligand-related plausibility.
The system incorporates refined architecture and training optimizations, while strictly excluding all wwPDB entries released on or after September 30, 2021, to prevent data leakage.
Performance was assessed using DockQ success rates on antibody-antigen interface benchmarks, BLI-confirmed hit rates across diverse soluble and membrane-protein targets, and PoseBusters-style chemical validity metrics