Monoclonal antibodies play an increasingly important role in both human and animal health. To optimize their clinical performance, scientists need to perform computational characterization of antibodies. t's a pretty mundane but fundamental part of that is comparing one antibody to another.
An essential prerequisite for antibody analytics is the use of standardized numbering methods. They help to identify complementary determining regions (CDR), frameworks, and residues from both light and heavy chains that may affect the antibody-antigen interaction's binding affinity and/or specificity.
What exactly is antibody numbering and how does it work? Keep on reading this article to find out.
Table of contents:
Suppose that you have two antibodies in front of you and want to check to what extent they're similar.
The first approach is to compare the sequences of the four chains. But here, we immediately run into the problem of comparing sequences of unequal length. Another issue is that two sequences that initially look different become quite similar if we shift them relative to each other.
It's like comparing a movie to the same movie with the first scene removed. If we compare the original movie and the edited version frame-by-frame, the two will look nothing like each other. Still, we'd intuitively feel that the two things are mostly the same.
To see that a simple shift isn't always enough, consider comparing a movie with its extended version where several scenes are added in various places. Again, even though the material is largely the same, we won't be able to get a decent frame-by-frame agreement - even if we allow for an arbitrary shift.
The general conclusion to this is that one should be careful when comparing sequences.
If our goal is to capture similarities between sequences in a meaningful way, we need to consider some simple transformations that allow these sequences to become aligned.
We talk about mutations, deletions, and insertions in comparing genetic protein sequences. Many algorithms perform pairwise or multiple sequence alignment that reflects the evolutionary differences between two genetic/protein sequences.
In antibodies, there is no 'evolutionary' history per se since antibodies differentiate in a matter of days in response to a pathogen. For this reason, we employ numbering schemes to compare antibodies.
All the problems described above appear when we try to compare antibody sequences: these sequences might be of unequal length, or they might differ by insertions or deletions.
The solution is known as antibody numbering.
The idea is to look at a large number of antibodies naturally appearing in living organisms and try to construct a consistent scaffold to which most known antibodies would fit reasonably well.
By scaffold, we mean a list of regions that are easy to identify (e.g., because they're almost constant) and some information about how the gaps should be filled (e.g., the number or type of amino acids in between).
Once the scaffold is ready, we can come up with a numbering system where the easy-to-identify regions are always assigned the same positions while the remaining amino acids are given some intermediate labels.
The hope is that comparing two sequences with an appropriately chosen numbering scheme is as simple as checking amino acids position-by-position. In fact, one could even hope that a given position corresponds to a specific location in the 3D structure of the antibody.
The first scheme was proposed by Wu and Kabat in 1970 and relied only on the amino acid sequence. It was later improved by Chothia (1987), who introduced structure-based factors into the numbering scheme. More recent proposals include the IMGT and Aho numbering schemes, which allow for greater flexibility when dealing with insertions and deletions (Honneger and Plückthun 2001; Lefranc et al. 1999).
Comparing antibodies in a meaningful way is an important application of numbering schemes. However, originally they were introduced as a tool to study the variability of antibodies in a rigorous manner.
Note: Variability tells us whether the same sequence always occupies a region or a position or whether there we might find a diversity of sequences there.
Similarly, numbering schemes can be used to consistently divide peptide chains into regions with distinct properties and functions. For instance, complementarity determining regions (CDRs) could be defined solely based on a chosen numbering scheme.
Such a division into regions turns out to be particularly relevant if we want to transfer a specific functionality identified in a non-human antibody to an antibody closer to human ones. This process is known as humanization.
It's important to stress that there exists no objectively correct way of assigning integers to amino acids in a peptide chain. A numbering scheme is always based on a specific dataset of antibodies. One that works well for one dataset might not be suitable for another dataset.
Given the large quantity of new NGS data generated currently, we need to be ready to improve and adjust our numbering schemes as we go along. We must also accept that the performance of numbering schemes depends on the specific application, and one should not try to identify the one scheme that outperforms all the others.
Finally, remember that numbering schemes are not fundamental scientific notions. They are tools designed to help us achieve specific goals.
We should use them as long as they serve us well, but there's no harm in considering other approaches that don't force us to uniquely label every position in a peptide chain.
For instance, while it's certainly useful to be able to split an antibody into regions, placing the exact boundaries might be of secondary importance. Consider Vernier Zone residues or paratope identification at this point.
In humanization, one needs to modify regions close to the CDRs but not CDRs themselves. That's why defining the boundary plays a crucial role here. Furthermore, not all antigen-binding residues will be found in the CDRs, or at least with respect to a specific definition of these.
An effective amino acid numbering system allows us to assign the same number of residues to structurally aligned positions in antibodies from different species. Although several databases are available online, it's good to compare the different numbering systems as inaccuracies are still possible.
So, where can you get your numbering annotations from? There is open-source ANARCI for amino acid sequences, which is widely used and Python-based, extending the popular hidden Markov model (HMM) program.
Keep an eye on our Resources section to get more introductory materials on antibodies and computational approaches to antibody engineering.
To learn more, check out AbStudio - a solution that allows teams to create, collate, and discover antibody-specific datasets to accelerate research decision-making.
References:
Chothia Cyrus, Lesk Arthur M. Canonical structures for the hypervariable regions of immunoglobulins. JMB. (1987) 196.4: 901-917. https://doi.org/10.1016/0022-2836(87)90412-8
Honegger Annemarie, Plückthun Andreas. Yet Another Numbering Scheme for Immunoglobulin Variable Domains: An Automatic Modeling and Analysis Tool. JMB (2001) 309.3: 657-670. https://doi.org/10.1006/jmbi.2001.4662
Lefranc M P et al. IMGT, the international ImMunoGeneTics database. Nucleic Acids Research (199) 27.1: 209–212. https://doi.org/10.1093/nar/27.1.209
Wu TT, Kabat E a. An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity. J Exp Med. (1970) 132:211–50. https://doi.org/10.1084/jem.132.2.211