Protein identification terminology

From SPCTools

Jump to: navigation, search

Each PeptideAtlas build is associated with a reference database -- usually a combination of several protein sequence databases (Swiss-Prot, IPI, Ensembl ...) for the species plus a database of contaminants. From the reference database, any protein that contains any observed peptide is considered to be a member of the Atlas. It is easy to see that the entire list of proteins in an Atlas is going to be highly redundant. Thus, we label each Atlas protein using the terminology below.

The term observed peptides in this context refers to the set of peptides in the PeptideAtlas build. These peptides are selected using a PSM (peptide spectrum match) FDR threshold applied to each experiment separately. (In older builds, peptides were selected using a probability cutoff to all PSMs for the Atlas.)

A new implementation of the PeptideAtlas Browse Proteins tab was released in December 2009. This implementation allows you to select proteins based on the terminology below.

Protein Presence Levels

Taken together, the set of proteins with a Presence Level label for any Atlas has the property that no two members share exactly the same set of observed peptides.

Label Technical definition Practical definition
Canonical From each ProteinProphet protein_group, the protein with the highest probability is selected to be canonical. Then, recursively, any other protein in that group which shares fewer than 80% of its peptides with any other canonical from that group is also labeled canonical. During this selection process, each set of indistinguishable proteins is considered to be a single entity, and the one from that set with the most preferred identifier (for human and mouse, Swiss-Prot primary splice variant) is the one labeled canonical. The set of canonical proteins is a minimal, non-redundant list of proteins derived from the set of identified peptides for an Atlas. The number of canonicals is what we use as the protein count for the Atlas build.
Possibly Distinguished From each ProteinProphet protein_group, any protein that is not canonical and not subsumed is labeled possibly_distinguished. As above, from among any set of indistinguishable proteins, only one (the one with the most preferred identifier) is labeled possibly_distinguished. The set of canonical proteins plus possibly_distinguished proteins is a more inclusive, but also non-redundnat, list of proteins derived from the set of identified peptides for an Atlas. The canonical list will not explain all observed peptides, but the combined canonical plus possibly_distinguished list will explain all observed peptides.
Subsumed Any protein labeled subsumed by ProteinProphet. As above, from among any set of indistinguishable proteins, only one (the one with the most preferred identifier) is labeled subsumed. A protein whose observed peptides are a subset of the observed peptides of a canonical protein is considered subsumed. For any pair of canonical/subsumed proteins, it is possible that both have been observed, but it is more conservative to claim that only the canonical has been observed. Subsumed proteins are not necessary to explain all observed peptides.
NTT-Subsumed Any protein that is possibly_distinguished by the above definition, but whose peptides differ from a canonical only by the number of tryptic terminii. Proteins that are ntt-subsumed contain exactly the same set of observed peptides as a canonical protein, but at least one of those peptides has fewer tryptic terminii in the ntt-subsumed protein. For any pair of possibly_distinguished/ntt-subsumed proteins, it is much more likely that the possibly_distinguished has been observed, because that will be the one with the greater number of tryptic terminii among its peptides.

Protein Redundancy Labels

Proteins that are in the Atlas reference set that are redundant to proteins with the above labels are given the labels below.

</table>
Label Technical definition Practical definition
Indistinguishable Indistinguishable from a protein with a Presence Level label, according to ProteinProphet Exactly the same peptides from this protein have been observed as have been observed for a protein with a Presence Level label.
Identical Identical in sequence to a protein with any other label. Identical in sequence to a protein with any other label.
Personal tools