Protein identification terminology
From SPCTools
The term observed peptides below refers to the set of peptides in a particular PeptideAtlas build. These peptides were selected using a PSM (peptide spectrum match) FDR threshold applied to each experiment separately. (In older builds, peptides were selected using a probability cutoff to all PSMs for the Atlas.)
Protein Presence Levels
Taken together, the set of proteins with a Presence Level label for any Atlas has the property that no two members share exactly the same set of observed peptides.
Label | Technical definition | Practical definition |
Canonical | From each ProteinProphet protein_group, the protein with the highest probability is selected to be canonical. Then, any other protein in that group which shares fewer than 80% of its peptides with any other canonical from that group is also labeled canonical. During this selection process, each set of indistinguishable proteins is considered to be a single entity, and the one from that set with the most preferred identifier (for human and mouse, Swiss-Prot primary splice variant) is the one labeled canonical. | The set of canonical proteins is a minimal, non-redundant list of proteins derived from the set of identified peptides for an Atlas. The number of canonicals is what we use as the protein count for the Atlas build. |
Possibly Distinguished | From each ProteinProphet protein_group, any protein that is not canonical and not subsumed is labeled possibly_distinguished. As above, from among any set of indistinguishable proteins, only one (the one with the most preferred identifier) is labeled possibly_distinguished. | The set of canonical proteins plus possibly_distinguished proteins is a more inclusive, but also non-redundnat, list of proteins derived from the set of identified peptides for an Atlas. The canonical list will not explain all observed peptides, but the combined canonical plus possibly_distinguished list will explain all observed peptides. |
Subsumed | Any protein labeled subsumed by ProteinProphet. As above, from among any set of indistinguishable proteins, only one (the one with the most preferred identifier) is labeled subsumed. | A protein whose observed peptides are a subset of the observed peptides of a canonical protein is considered subsumed. For any pair of canonical/subsumed proteins, it is possible that both have been observed, but it is more conservative to claim that only the canonical has been observed. Subsumed proteins are not necessary to explain all observed peptides. |
NTT-Subsumed | Any protein that is possibly_distinguished by the above definition, but whose peptides differ from a canonical only by the number of tryptic terminii. For any pair of possibly_distinguished/ntt-subsumed proteins, it is much more likely that the possibly_distinguished has been observed, because that will be the one with the greater number of tryptic terminii. | Proteins that are ntt-subsumed contain exactly the same set of observed peptides as a canonical protein, but at least one of those peptides has fewer tryptic terminii in the ntt-subsumed protein. |
Proteins that are in the Atlas reference set that are redundant to proteins with the above labels are given the labels below.
Protein Redundancy Labels
Label | Technical definition | Practical definition |
Indistinguishable | Indistinguishable from a protein with a Presence Level label, according to ProteinProphet | Exactly the same peptides from this protein have been observed as have been observed for a protein with a Presence Level label. |
Identical | Identical in sequence to a protein with any other label. | Identical in sequence to a protein with any other label. |