Extensive collection supporting computational research in molecular modeling and protein recognition, with a focus on patented antibody sequences and their targets.
Learn moreIntroduction
Patents Database (PatentsDB) is a comprehensive repository of antibody-related sequences extracted from four distinct patent sequence sources: DDBJ, WIPO, USPTO, and PSIPS. These heterogeneous data formats are integrated with family metadata from the European Patent Office (EPO), resulting in a collection of 4,675,969 total sequences, of which 456,570 are unique, originating from 25,019 patent families.
To enhance the utility of the database, imported sequences undergo preprocessing and annotation to identify antibody presence. Nucleotide sequences are translated into amino acid sequences using IgBlast, and all amino acid sequences are numbered according to the IMGT scheme using ANARCI. Additionally, antibody target recognition and UniProt mapping are performed, establishing links between patents and UniProt entries. This process has identified connections to 6375 unique UniProt targets across 817 distinct organisms.
The database is structured around two main entities: families and sequences, facilitating efficient data navigation and retrieval. This organization supports researchers and professionals in exploring the landscape of patented antibodies, aiding in the understanding of engineering efforts and promoting advancements in therapeutic antibody development.
456 570
Unique sequences
25 019
Unique families
with numbered sequences
6 375
Unique targets UniProt ids
PatentsDB is freely available for non-commercial organisations for non-commercial research. Commercial inquiries are welcome contact us.
Google Colab notebook with examples of how to use the Antibody Patents database is available here.
Directory structure
PatentsDB is accessible using google drive with the following directory structure. It allows you to download a single dataset in specific format or all datasets at once.
Data format
The PatentsDB dataset is available in parquet/delta (delta.io) format streamlining the integration into data processing pipelines (using Apache Spark, Pandas, DuckDB). It consists of 2 types of entities:
PatentsDB is freely available for non-commercial organisations for non-commercial research. Commercial inquiries are welcome contact us.
Google Colab notebook with examples of how to use the Antibody Patents database is available here.
Citing this work
We make the Patented Antibody Database available as a companion to our paper.