Antibody Database

Benefit from decades of research packed into Antibody Database.A curated and comprehensive database that opens the doors to data accumulated through decades of research into therapeutic antibodies. Accelerate your research with easy access to antibody data that is no longer fragmented or non-standardized. Reap benefits from antibody-specific searches and generate novel conclusions about the biology of antibodies thanks to our data integration.

Learn more
Data sources


We collect antibodies associated with patent documents from major sources (Krawczyk et al. 2021). We identify the antibody sequences based on sequence features, patent document text, and classification to assess whether it contains antibodies or not. Each antibody sequence from a patent is furthermore associated with the text metadata of the document from which it originated to facilitate text-based searches for sequences associated with specific biological entities (e.g., particular targets). This database covers ~3,500,000 antibody sequences from USPTO, WIPO, DDBJ, and EBI (~280,000 unique sequences).



We recently passed the mark of 100 antibodies approved by the FDA, with hundreds more undergoing clinical trials. We collect the sequences of such antibodies with assigned International Nonproprietary Names (INNs) and associate them with rich metadata such as target information. We currently hold data on more than 826 therapeutic antibodies .



The Protein Data Bank (PDB) is the primary public source for three-dimensional conformation data on biomolecules. Antibody sequences from the PDB are identified by antibody sequence features and text mining of metadata fields associated with particular chains and the entire PDB documents. Currently, we identify more than 6,500 structural depositions containing antibodies.



Biological sequences often aren’t deposited in standardized repositories such as GenBank. Instead, they may be listed directly in scientific publications and their supplementary material. Currently, there’s no reliable automated method available to identify such sequences. That’s why this category encompasses antibody sequences we add to our database based on the manual curation of scientific publications. We currently have more than 5000 of those . Antibody sequences here are linked to the metadata of publications that they originate from and targets to facilitate text-based retrievals.



Next-Generation Sequencing now allows us to query the great sequence variability covered by antibody sequences. We identify bioprojects that specifically include NGS of antibodies. Querying and analyzing such datasets provides richer information on the sequence variability of antibodies than studying limited germline sequences available. Our version of the NGS database covers more than 200 bioprojects with a combined 25 billion raw reads.



NCBI GenBank is one of the primary structured sources for depositing biological sequences associated with scientific outputs. Antibodies from this database are identified by antibody/nanobody-like sequence features combined with text mining of the metadata of deposition. Each GenBank antibody sequence is associated with its source deposition document, which includes the source organism, deposition description, and source publication, which facilitates text-based retrieval of sequences associated with certain biological entities (e.g., particular targets). This database covers ~175,000 unique variable region sequences from ~200,000 accessions.