AbDiver – A tool for exploring the natural antibody landscape to support therapeutic design


Executive summary: AbDiver is an online tool that helps researchers make the most of the vast NGS data on antibody mutations by allowing them to spot parallels between natural and therapeutic antibodies. AbDiver accelerates the decision-making process during the lead optimization stage for a more efficient therapeutic pipeline.

Note: This article covers content published in Jakub Młokosiewicz, Piotr Deszyński, Wiktoria Wilman, Igor Jaszczyszyn, Rajkumar Ganesan, Aleksandr Kovaltsuk, Jinwoo Leem, Jacob Galson, Konrad Krawczyk. “AbDiver – A tool to explore the natural antibody landscape to aid therapeutic design.Bioinformatics, btac151, https://doi.org/10.1093/bioinformatics/btac151

Project overview

How can researchers design a new drug based on an antibody faster? One good method is based on taking advantage of the natural sequence diversity of these molecules. Our understanding of antibody diversity for antibody engineering has grown significantly due to the deposition of hundreds of millions of human antibody sequences in next-generation sequencing (NGS) repositories.

Researchers can contract a query antibody sequence to naturally-observed diversity in similar antibody sequences stored in NGS to find a mutational roadmap for designing biotherapeutics.

However, the sheer scale of the antibody NGS datasets renders such searches computationally challenging.

To facilitate access to antibody NGS data, a group of researchers, including our team members, developed AbDiver, a free online tool that allows researchers to compare their query sequences to those observed in the natural repertoires.

Use cases

AbDiver addresses three antibody-specific use cases:

  1. Comparing the query antibody to positional variability statistics (precomputed from multiple independent studies),
  2. Retrieving close full variable sequence matches to the query antibody,
  3. Retrieving its CDR3 or clonotype matches.

AbDiver was applied to a set of 742 therapeutic antibodies, demonstrating that it can easily retrieve relevant results (for the majority of sequences).

Data sources in AbDiver

As the underlying data for AbDiver, we used publicly curated, unpaired BCR NGS datasets from the Observed Antibody Space (OAS) (Kovaltsuk et al., 2018).

In May 2021, the set consisted of 81 studies with 906,933,358 unique BCR sequences numbered according to the IMGT scheme (105,730,531 light chains and 801,202,827 heavy chains). We are going to update AbDiver as more datasets become available.

To benchmark our solution, we used a set of 742 therapeutic antibodies, which extended a set from our previous study (Krawczyk et al., 2021).

Key features of AbDiver

V-region profiling service

The AbDiver V-region natural profiling service annotates the variable region of the query antibody sequence with the naturally observed amino acid frequency statistics for each position.

The tool calculates frequency statistics from all antibodies that have the same combination of V-gene and J-gene. The study included amino acid positional frequency if it consisted of at least 100 observations at a given position. For each position, our team calculated the study-specific Shannon entropy and ranks of the amino acids by frequency.

Sequence retrieval service

We created indexes based on k-mer (k=5) for CDRs separately in full variable-region sequences and CDR3s. The tool identifies variable sequence matches based on the same length CDR1, CDR2, with one residue discrepancy allowed for CDR3.


AbDiver was created with the goal of helping researchers navigate natural antibody diversity and draw between natural and therapeutic antibodies for the purpose of engineering. This would eliminate Post Translational Modification risks while maintaining favorable biophysical properties. AbDiver also excavates sequences with potentially better product profiles than the lead therapeutic.

We hope that AbDiver supports researchers in designing and engineering therapeutics based on antibodies.

AbDiver is accessible to use free of charge at http://naturalantibody.com/abdiver.

To learn more, check out AbStudio - a solution that allows teams to create, collate, and discover antibody-specific datasets to accelerate research decision-making.


  • ​​Christley, S. et al. (2020) The ADC API: A Web API for the Programmatic Query of the AIRR Data Commons. Front. Big Data.
  • Gutiérrez-González,M. et al. (2021) Human antibody immune responses are personalized by selective removal of MHC-II peptide epitopes. bioRxiv.
  • Jones, T. et al. (2021) ClonoMatch: a tool for identifying homologous immunoglobulin and T-cell receptor sequences in large databases. Bioinformatics.
  • Kovaltsuk, A. et al. (2018) Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. J. Immunol.
  • Krawczyk, K. et al. (2021) Data mining patented antibody sequences. MAbs.
  • Krawczyk, K. et al. (2019) Looking for therapeutic antibodies in next-generation sequencing repositories. MAbs
  • Marks, C. et al. (2021) Humanization of antibodies using a machine learning approach on large-scale repertoire data. Bioinformatics.
  • Martin, A.C.R. (2014) Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV). F1000Research.
    Mason, D.M. et al. (2021) Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning. Nat. Biomed. Eng.
  • Petersen, B.M. et al. (2021) Regulatory Approved Monoclonal Antibodies Contain Framework Mutations Predicted From Human Antibody Repertoires. Front. Immunol.
  • Smakaj, E. et al. (2020) Benchmarking immunoinformatic tools for the analysis of antibody repertoire sequences. Bioinformatics.
  • Venkataramani, S. et al. (2020) In Pursuit of Stability Enhancement of a Prostate Cancer Targeting Antibody Derived from a Transgenic Animal Platform. Sci. Rep.
  • Zhang, W. et al. (2020) PIRD: Pan Immune Repertoire Database. Bioinformatics