System Requirements and Installation
Isotopically labeled crosslinking reagents (bis-NHS esters) and a protein complex of known crystal structure were used to test our new automated computer analysis tools for MALDI mass spec data derived from digested samples of the crosslinked protein complex. The software used for automated analysis contains three programs: iXLINK, a PERL program executed from the command line, doXLINK, a library of java programs executed by a PERL script which is also executed from the command line, and the java program XLinkViewer.
Method outline: Prior to computational analysis, two crosslinked peptide mixtures are independently fractionated by reverse phase chromatography and spotted onto two standard MALDI plates. Each mixture contains peptides modified with either the isotopically heavy or light versions of the crosslinking reagent. Consequently, modified peptides should exist within a given spectra as a doublet - a pair of peaks separated by the mass difference between the heavy and light forms of the label. One of the samples is reacted in buffer containing [16O] water. The other is reacted in buffer containing a mix of [16O] and [18O] water. Within the [16O] / [18O] mixture, monolink peaks will appear as quadruplets whereas crosslink peaks will appear as doublets. Further details of the experimental procedure are given in the original publication. [1]
iXLINK classifies peaks from a duplex LC-MALDI experiment into one of four categories: noise, high abundance, crosslink-derived and monolink-derived. iXLINK generates a database of crosslinker-modified species based on protein sequence(s), crosslinker and protease(s). Once classified, mass mapping is used to generate a preliminary sequence assignment. iXLINK also creates mass inclusion lists for subsequent MS/MS acquisition of potentially crosslinker-modified peptide species.
doXLINK provides matching scores for MS/MS data, that was acquired based on mass inclusionlists created by iXLINK, to peptides included in iXLINK's preliminary sequence asignment.
XLinkViewer is used to visualize iXLINK and doXLINK results, and allows the user to confirm or reject the doXLINK assignments.
back to top ![]() |
Proteins: Multiple proteins suspected to be found in the sample and to be modified with crosslinking reagent.
Proteolytic Digestion: The number and type of proteases can be specified. i.e. Trypsin + Asp-N.
Missed Cleavages: The number of missed cleavages can be specified.
Modifications: i.e. Single/double methionine oxidation and cysteine alkylation are allowed.
Crosslinking Reagent: This software can be used with bis-NHS-esters only - (homobifunctional amino-reactive cross-linkers, sensitive to hydrolysis, trypsin won't cleave C-terminal to modified lysine residues). We have used DSS-do/-d12, DSG-d60/-d6, BS2-d0/d4, and BS3-d0/-d4.
LC-MS-Instrumentation: LC fractions of a protease digested crosslinked protein complex are deposited on a MALDI plate. We used a 192 Well Stainless Steel MALDI Sample Plate for sample introduction on the Applied Biosystems/MDS Sciex 4700 Proteomics Analyzer. The spotting pattern is shown in Figure 1, starting with spot positions A1, A3... A23, in the top row, proceeding with A2, A4..., A24, through spot position H24, at the bottom right. The file timeConversion.txt assigns sequential number indices to these MALDI plate spots. The indices reflect the order of the LC fractions, i.e. MALDI plate spots A1, A3...A23 are assigned to indices 0, 1... 11, and the next row A2, A4...A24 are assigned to indices 12, 13...23. Thus, the user can change these assignments as desired.
MALDI spectra: automatic acquisition with an Applied Biosystems/MDS Sciex 4700 Proteomics Analyzer using the 4700 Explorer software. All automatically acquired data is stored in the integrated Oracle database of the program. Mass spectra in tab-delimited text file format are generated using the Applied Biosystems/MDS Sciex Data Explorer Software and a macro in Visual Basic (MS, see below) and the Peaks-to-Mascot feature in the 4700 Explorer software (MS/MS).
back to topThe Applied Biosystems/MDS Sciex 4700 Explorer allows the user to specify the spots on the MALDI plate to be analyzed. For each spot, MS data is obtained after multiple laser shots, and the data is averaged to give a MALDI mass spectrum for each spot. Each spectrum is exported to a binary spectrum file, *.t2d together with a *.cal calibration file. For example, A5_MS_11.t2d is the mass spectrum from spot A5, and A5_MS_11.cal is its respective calibration file. Using the Data Explorer software package provided with the Applied Biosystems/MDS Sciex 4700 MALDI mass spectrometer together with a modified macro originally provided to us by the manufacturer (click here for instructions about how to use and install this macro in Data Explorer), the *.t2d files are converted into text files, *.txt (for example, see A1_MS_5.txt). A portion of a typical *.txt file is shown below:
| Mass | Area | Resolution | S/NRatio |
| 565.054382324219 | 241.114852905273 | 8987.953125 | 27.1505069732666 |
| 581.069152832031 | 344.134460449219 | 8790.4013671875 | 43.0353393554688 |
| 587.022888183594 | 1002.90197753906 | 9292.6064453125 | 135.796173095703 |
| 609.019836425781 | 141.949584960938 | 8420.2470703125 | 16.5039558410645 |
| 648.776062011719 | 281.1484375 | 4033.81396484375 | 10.8354415893555 |
The mass column contains the monoisotopic masses, which have been calibrated by the Applied Biosystems/MDS Sciex 4700 software using the *.cal files. The area column gives the area of the peak in the mass spectrum after the areas of the peaks from the natural abundance heavy isotopic peaks have been added to the monoisotopic peak area (peak clustering carried out by the Applied Biosystems/MDS Sciex Data Explorer software). The last column gives the signal-to-noise ratio. The iXLINK software uses the mass, area, and S/NRatio data extracted from the *.txt files; the Resolution data is not used. iXLINK reads the *.txt file names and extracts the MALDI plate spot number. Thus, the filename should have the same format specified above.
back to topiXLINK requires a work folder including four parameter files and two folders "16" and "18" with the MS data files.
Work Folders: create a new folder, i.e. iXLINK_01 and another two folders, one called 16 and one called 18.
Data Files: Copy all MS data files (*_MS_*.txt) from the LC-MS run using [16O] water into folder 16. Copy all MS data files (*_MS_*.txt) from the [16O]/[18O] water experiment into folder 18.
Paramter Files: Place the following files (click on links 1-4) to view some example files) into folder iXLINK_01:
aa_mass.txt contains the monoisotopic masses of the amino acid residues included in ProteinSequence.txt (see below).
timeConversion.txt is described above.
ProteinSequence.txt contains the amino acid sequences of all proteins in the sample to be analyzed. The file name may be user-specified.
(in the case that the proteins in a sample are not known prior to the iXLINK analysis, please see this note*)
A typical file is shown below:
>A:1 MELKNSISDYTEAEFVQLLKEIEKENVAATDDVLYVLLEHFVKITEHPDGLDLIYYPSDNRDDSPEGIVKEIKEWRAANGKPGFKQG >B:432 MRGSHHHHHHGSGSKRNKPGKATGKGKPVNNKWLNNAGKDLGSPVPDRIANKLRDKEFKSFDDFRKKFWEEVSKDPELSKQFSRNNNDRMKVGKAPFTRTQDVSGKRRSFELHHEKPISQNGGVYDMDNISVVTPKRAIDIHThe "A" in line 1 is the user specified name of the first subunit. The "1" in line 1 designates that the first amino acid in the sequence is number 1 (note that the sequence of some proteins may start with a residue number other than 1). The second line gives the protein sequence using single letter amino acid designations. In the example above, a second protein subunit, subunit B, whose first residue is number 432, has also been included.
params.pl gives the user specified parameter values for running iXLINK. The PDF file iXLINK_params.pdf gives a pictorial description of the various user specified parameters within params.pl. (To view PDF files, download Adobe Acrobat Reader from http://www.adobe.com). In addition, most of these parameters are explained in the original publication. [1] Parameters not described in the original publication are given here (comment numbers below refer to the file iXLINK_params.pdf). Some of the parameters in params.pl have no comment information in iXLINK_params.pdf, and these should not be modified.
back to topTo edit params.pl, open the file with a text editor such as Notepad. Be sure to save it under the name params.pl as a text file.
#8 massCalibrationError: If there is a systematic shift in the mass of the peptides in the MALDI MS in one run versus another, the user can specify this mass shift. For example, if the user decides that all masses in the *.txt files in the 18 folder are off by +1 Da, massCalibrationError should be set to 1. iXLINK then substracts 1 Da from all of the MALDI peaks in all *.txt files in the 18 folder. Negative values are also accepted.
#9 heavyLightRTShift: This is the systematic shift in LC retention time caused by starting the MALDI plate spotter at different times after sample injection onto the LC column.
#10 heavyLightRTError: Setting this parameter to 0 implies that each peptide elutes at the same retention time in the two separate LC runs. We used a value of 20 spots. In this case, if a doublet split by 12 Da (d0- and d12-DSS crosslinker attached) is observed in LC fraction 27 in the [16O]water sample data set, iXLINK will look for the corresponding peptide in spots 7-47 in LC run 2 ([16O]/[18O]water run).
#11 IdentificationMassError: This is the difference in mass between the observed mass for the query peptide and the mass of the peptide in the database of calculated peptide masses. This is used by iXLINK for mass mapping.
#12 numberOFEnzymes: This is the number of different proteases used plus 2. For example, if the protein sample is digested with both trypsin and AspN, this parameter is set to 4.
#13 minimumDigestMass and maximumDigestMass: After calculation of the masses for all monolink modified peptides, iXLINK keeps only those that fall in the range of miminumDigestMass to maximumDigestMass Da. We chose 4000 for maximumDigestMass because the MALDI instrument was setup to scan up to 4000 Da.
#14 minimumXlinkMass and maximumXlinkMass: Same as above except for calculated masses of crosslinked peptides.
#15 cysMod: The mass of the cysteine modification, i.e. 57.02 for modification with iodoacetamide.
The parameter values given in iXLINK_params.pdf are those used in our original study of the Dnase/Im7 heterodimer treated with d0/d12-DSS crosslinker and digested with trypsin and endoproteinase Asp-N.The directory structure should now look like this:
To run the PERL-script, open the command window on the PC and use the cd command to go to the iXLINK_01 folder. On the command line type:
> iXLINK.pl ProteinSequence.txt ../16 ../18where ProteinSequence.txt is the name of the protein sequence text file. Note, "../16" and "../18" designate the folders containing the MS data files, located one directory path up, relative to the current directory.
If the command line also contains a "1" at the end:
> iXLINK.pl ProteinSequence.txt ../16 ../18 1
iXLINK will also write an output file i.e. ProteinSequence.txt.db.txt, the iXLINK database, that gives the masses of all calculated monolink and crosslink species along with the identity of the peptides. (note: this might be a large file)
Some representative rows in this file are shown below:
1 2 3 4 5 6 7 8 9 10 11 12 ----- - - ------------------------------ ------------------------------ ------- --- --- --- --- --- --- XLINK A B n#M*ELK nM#RGSHHHHHHGSGSK#RNKPGKATGKGK 3616.79 0 4 431 458 0 446 XLINK A B n#M*ELK nM#RGSHHHHHHGSGSKRNKPGK#ATGKGK 3616.79 0 4 431 458 0 452 XLINK A B n#M#ELK nM*RGSHHHHHHGSGSKRNKPGK#ATGKGK 3616.79 0 4 431 458 0 452 XLINK A B n#M*ELK nM#RGSHHHHHHGSGSKRNKPGKATGK#GK 3616.79 0 4 431 458 0 456 XLINK A B n#M#ELK nM*RGSHHHHHHGSGSKRNKPGKATGK#GK 3616.79 0 4 431 458 0 456 XLINK A B n#M#ELK nM*RGSHHHHHHGSGSK#RNKPGKATGKGK 3616.79 0 4 431 458 0 446 XLINK A B n#M#ELK n#M*RGSHHHHHHGSGSKRNKPGKATGKGK 3616.79 0 4 431 458 0 431 XLINK A B n#M*ELK nM#RGSHHHHHHGSGSKRNK#PGKATGKGK 3616.79 0 4 431 458 0 449 XLINK A B n#M#ELK nM*RGSHHHHHHGSGSKRNK#PGKATGKGK 3616.79 0 4 431 458 0 449 XLINK B A n#M#RGSHHHHHHGSGSKRNKPGKATGKGK n#M*ELK 3616.79 431 458 0 4 431 0 MONO B - RNK#PGK - 855.51 447 452 - - 449 - LOOP B B ATGKGK#PVNNK#WLNNAGK ATGKGK#PVNNK#WLNNAGK 2035.11 453 470 453 470 458 463 LOOP B B ATGK#GK#PVNNKWLNNAGK ATGK#GK#PVNNKWLNNAGK 2035.11 453 470 453 470 456 458 LOOP B B ATGK#GKPVNNK#WLNNAGK ATGK#GKPVNNK#WLNNAGK 2035.11 453 470 453 470 456 463
Such a *.db.txt database file consists of 12 columns. Each row represents a single entry of the computed database of crosslinker-modified peptides. The 1st column indicates the type of modification which is either crosslink (XLINK), monolink (MONO) or looplink (LOOP). The letters "A" or "B" in columns 2 and 3 refer to the location (the subunit from which a peptide is derived, according to the file ProteinSequence.txt) of the two peptides listed in columns 4 and 5 in single letter-code that are interconnected to form a crosslinked species. A "-" in columns 3 and 5 is only found in the case of monolinks, where no second peptide is part of in the molecule. In the case of looplinks, the peptide sequence shown in column 5 is simply a copy of the entry in column 3. "n#" and "K#" in the sequence of such peptides in columns 4 and 5 indicate the potentially modified amino acid residues (here amino termini "n" and lysines "K") that are attached to one end of the crosslinking reagent. "n#" is therefore a cross-linker-modified N-terminus and K# is a crosslinker-modified lysine. # and * symbolize either single (#) or double (*) modification of selected residues (i.e. here: singly or doubly oxidized methionines, M# and M*, C# for carbamidomethylated cysteine). Column 6 is the calculated monoisotopic mass of the molecular species. Columns 7 and 8 are the start- and end-residue number of peptide 1 (see column 4), and likewise are columns 9 and 10 for peptide 2 (see column 5), all this according to the specified start residue numbers in ProteinSequence.txt (note, that a "0" in columns 7 and 9 indicates that the cross-linker-modified alpha-amino group of the n-terminus (here residue # 1) is counted as residue 0 by iXLINK. Columns 11 and 12 are the positions of the modified amino acid residues of peptides 1 and 2 (see columns 4 and 5), relative to the respective residue # of the N-terminus.
For example, in row 1 in the table above, the molecular species is a crosslink formed by the two peptides MELK derived from subunit A and MRGSHHHHHHGSGSKRNKPGKATGKG derived from subunit B. The first peptide starts with the n-terminus at methionine residue #1 and ends with lysine residue #4. The second peptide starts with the n-terminus at methionine residue #431 and ends with lysine residue #458. Peptide 1 is connected via its n-terminus, residue #0, to lysine residue #449 in peptide 2. The monoisotopic mass of this species is 3616.79 Da.
If the user only has a single LC/MALDI run to be analyzed ([16O] water experiment), the command line entry is:
> iXLINK.pl ProteinSequence.txt ../16 -or
> iXLINK.pl ProteinSequence.txt ../16 - 1if a peptide database file should also be created.
In this case, iXLINK can only distinguish crosslinker-modified peptides from non-modified peptides, i.e. it cannot resolve modified peptides into monolink versus crosslink species. This is useful in cases where only the MS/MS data is used to distinguish crosslinks from monolinks.
back to topThis step requires the user to submit precursor masses of candidate crosslinker-modified peptides on the MALDI plate to MS/MS acquisition. A list of precursor masses of interest can be found in the iXLINK output file "16.lightFiles.EverythingPossible.InclusionList.txt". This file should be located in the folder /iXLINK_01. It contains pairs of masses (column 2) identified by iXLINK as precursor masses for candidate crosslinker-modified peptides together with their respective positions on the MALDI target plate (column 1) and their respective peak intensities (column 3). An extract of such an inclusion list file is shown below (click here to view a complete example file).
F4 2099.0071 2538.2166 F4 2111.0886 1652.9204 F4 2120.9661 242.8172 F4 2133.0583 167.1393 F4 2153.0437 13.8308 F4 2165.0166 14.4907
For example, in MALDI plate spot F4 there are multiple mass pairs found at 2099.0/2111.1 Da (rows 1+2), 2120.9/2133.0 Da (rows 3+4), and 2153.0/2165.0 Da (rows 5+6) with a common mass difference of ~12 Daltons, as expected for a 1:1 mixture of d0/d12-DSS modified peptides.
This inclusion list is submitted to a preferably automated series of MS/MS acquisitions. We used the Applied Biosystems/MDS Sciex 4700 Explorer software and created these series using the Spot Set Manager tool. Therefore precursor masses were copied from column 2 in the file "16.lightFiles.EverythingPossible.InclusionList.txt" and pasted into the respective column in the Spot Set Manager (we used MS Excel to copy/paste). For every precursor mass entry the respective spot position has to be selected manually from a pull-down menu. We hope that in future releases of the Applied Biosystems/MDS Sciex 4700 Explorer software there will be an easier way to do this (i.e. files with a similar format to "16.lightFiles.EverythingPossible.InclusionList.txt" could be uploaded directly into the Spot set Manager, or spot positions on a MALDI plate could be pasted along with precursor masses in the Spot Set Manager tool).
Once acquired, all MS/MS spectra are submitted to doXLINK analysis (see below).
The program doXLINK consists of two Java Archive (JAR) files, Xlink.jar and XlinkUtil.jar, in which multiple files are bundled into single archive files. They are executed by the PERL-script doXlink.pl. doXLINK has been run on a PC running Microsoft Windows XP Version 2002. To run doXLINK, you will need to install a Perl compiler and JavaTM. See Installation Instructions above.
doXLINK requires two parameter files, findpair.param and xlink.param and all MS/MS spectra that need to be analyzed to be in one folder.
First create a folder named doXLINK_01 and place the parameter files findpair.param, xlink.param, and all MS/MS spectrum files there. Find example files for findpair.param and xlink.param in the folder /XLINK_Program_Files/parameters/. The content of these files needs to be changed according to the crosslinking experiment (remove respective comments or parameters). All MS/MS spectrum files are required to be in the form of single text files, one for each MS/MS run (see below). The file format of these MS/MS spectrum files required for doXLINK is as shown below for the file B4_MSMS_762.txt. It may be noted that this file format is commonly used to conduct Mascot searches. "B4" in the filename designates the MALDI plate spot name and "762" designates the integer portion of the precursor mass. The filenames need to be in the format shown since doXLINK will extract information from the filenames.
BEGIN IONS TITLE=B4_MSMS_762 PEPMASS=762.4135 70.1159133911133 1244.08898925781 84.1473999023438 1373.8359375 87.137580871582 704.233154296875 234.198471069336 1885.90246582031 343.348876953125 771.76611328125 467.316528320313 391.350616455078 542.721008300781 778.282470703125 554.531372070313 2608.8828125 556.658081054688 945.875 576.517272949219 1593.81921386719 606.706481933594 4522.99462890625 608.71826171875 310.455993652344 671.855834960938 302.464660644531 698.934509277344 9999.92578125 701.32470703125 227.284851074219 702.177001953125 497.837890625 720.993591308594 526.599487304688 762.865478515625 6645.123046875 764.884338378906 800.627197265625 765.637817382813 695.7568359375 END IONS
In the above text file, the first column is the observed mass and the second column is the isotope cluster area. The second line of the file gives the MALDI plate spot name (B4) and the integer portion of the precursor mass.
The MS/MS data analysis with doXLINK requires three steps (which are explained in detail below):We created MS/MS spectrum files in two ways. The latter one is more convenient.
COM=Project: User Project 1\Jan, Spot Set: User Project 1\Jan\JS barcode 5 BEGIN IONS PEPMASS=634.3389 CHARGE=1+ TITLE=Label: C11, Spot_Id: 218906, Peak_List_Id: 615685, MSMS Job_Run_Id: 23127, Comment: 72.031227 340.87158 (...) 591.61432 389.84393 END IONS BEGIN IONS PEPMASS=1743.9498 CHARGE=1+ TITLE=Label: A17, Spot_Id: 218864, Peak_List_Id: 615378, MSMS Job_Run_Id: 23127, Comment: 110.17418 2824.3665 (...) 1703.6367 344.05386 END IONS (...)
In this case, the combined file of MS/MS spectra needs to be converted to individual MS/MS spectrum files (same format as specified above) prior to executing doXLINK. To convert such a text file (i.e. file.txt) to the respective multiple single spectrum files (including filenames in the required file name format) the script Pkl2txt can be used. Therefore a command window at the location of the text file needs to be opened, and the command line entry will be:
> java Pkl2txt filename.txtIt is recommended to save a copy of all spectrum files in text format to a location different from the one used in the next step (SeparateSpec) because they will be modified and overwritten.
The next step of the analysis is to run the program SeparateSpec. This script is used to assign all MS/MS spectra file contained in the folder doXLINK_01 pairwise to the respective inclusion lists ("prelim", "mono", "xlink", and/or "pairless", see iXLINK output files). The parameters given in findpair.param need to be chosen appropriately, and may eventually need to be adjusted to a given data set until all/most of the MS/MS spectra files are assigned, i.e. sorted into respective folders. Open the folder doXlink_01 and then open a command window and type:
> java SeparateSpec ../iXLINK_01
(Note that "../iXLINK_01" specifies the location of respective iXLINK output files.)
For each peptide pair (where the precursor mass difference corresponds to the difference in mass between the light and heavy crosslinker reagent, i.e. 12 Da in the case of d0/d12-DSS), SeparateSpec chooses the appropriate MS/MS spectrum files corresponding to each peptide pair and looks for fragment ion masses that differ by the mass difference between light and heavy crosslinker reagent. SeparateSpec only outputs the MS/MS spectrum of the peptide modified with the light crosslinker reagent. If this spectrum contains a fragment ion peak that is also found in the MS/MS spectrum of the corresponding heavy crosslinker-modified peptide (but shifted by 12 Da in the case of d0/d12-DSS), it annotates this peak as "true". For the "true" annotation, SeparateSpec also requires that the intensities of the light and heavy ion fragment peaks differ by less than 20% (actually the intensities normalized to the base peak of each MS/MS spectrum are used). If SeparateSpec does not find the corresponding 12 Da-shifted heavy fragment ion mass, it annotates the peak as "false". Peaks found in the MS/MS spectrum of the heavy crosslinker-modified peptide for which there is no corresponding peak in the MS/MS spectrum of the light crosslinker-modified peptide are not saved. In this way, SeparateSpec creates new MS/MS spectrum files for the light crosslinker-modified peptide that also contain the true/false annotations described above. These spectrum files are outputted into 4 new folders called MONO, XLINK, PAIRLESS and PRELIM. The MONO folder contains MS/MS spectrum files of peptides with precursor masses that are found in the inclusion list 16.light.mono.InclusionList.txt. Likewise, the XLINK folder contains MS/MS spectrum files of peptides with precursor masses that are found in the inclusion list 16.light.xlink.InclusionList.txt, and the PAIRLESS folder contains MS/MS spectrum files of peptides with precursor masses that are found in the inclusion list 16.light.pairless.InclusionList.txt, and the PRELIM folder contains MS/MS spectrum files of peptides with precursor masses that are found in the inclusion list 16.lightFiles.prelim.InclusionList.txt (see iXLINK output files).SeparateSpec creates only a single folder, PRELIM, in the case where there is only data used from the sample run in [16O]water (no data from the [18O]water sample used). The MS/MS spectrum filenames are changed by SeparateSpec from *.txt to *.match.txt. A portion of one of these *.match.txt files is shown below:
(...) 1169.5 24.2 false 1226.5 15.7 false 1255.6 50.4 true 1485.8 49.5 true (...)The first column gives the fragment ion mass, the second is the peak intensity normalized to the base peak, and the third column gives the true/false annotation described above. The MS/MS spectrum files derived from heavy crosslinker-modified precursor ions are no longer present in /doXlink_01. Of course all of the original MALDI MS and MS/MS spectrum files should be available on the computer controlling the MALDI mass spectrometer and can be backed up to a CD or appropriate data storage device.
This is the major step in the analysis of MS/MS spectra of crosslinked peptides. It is applied separately to every folder created by SeparateSpec (MONO, XLINK, PAIRLESS, and PRELIM, see above). It requires the parameter file xlink.param.
Note: In the case when iXLINK was used only for data from the sample run in [16O]water) this step is only applicable to the folder PRELIM.
The program does the following:
Based on the folder into which a mass spectrum was sorted using SeparateSpec, (MONO, XLINK, PAIRLESS, or PRELIM) and the precursor mass (contained in the MS/MS spectrum file name), peaks in every mass spectrum are matched against all computed fragments of all molecular species included in the respective peptide database created by iXLINK (16.18.monolinks.seqIDs.txt, 16.18.pairless.seqIDs.txt, 16.18.xlinks.seqIDs.txt or 16.prelim.seqIDs.txt, click here to see a list of all iXLINK output files). The selection of molecular species from the computed database to be matched against in doXLINK for every spectrum file is based on a mass window specified by the parameter precursor_mass_tolerance in the file xlink.param. All fragment mass peaks (from the MS/MS spectra files) are matched against computed peptide fragment masses within a mass window specified by the parameter fragment_mass_tolerance. Fragments containing the crosslinker are computed using the parameter xlink_group_mass_from_light (this mass is derived from the mass added to a peptide if modified with a light crosslinker molecule but hydrolyzed at one end in the case of monolinks, e.g. 256.08 for DSS). The parameters mono_reporter_ion_MH_from_light_precursor and cross_reporter_ion_MH_from_light_precursor specify expected reporter ion masses for either monolinks or crosslinks (we observed two potential reporter ions for DSS-modified peptides: 222.1 (crosslink) and 240.2 (monolink). Details about the molecular structure of these reporter ions can be found in the original publication. [1]
All matched peptide candidates are given a score based on the agreement of respective MS/MS spectra with all theoretical peptide fragments (see Table 1 below) calculated from the peptide databases *seqIDs.txt.
Table 1: MS/MS fragment ion assignments used by doXLINK
| Peptide Fragment | Monolink | Crosslink | Looplink |
| b-Ion | + | + | + |
| y-Ion | + | + | + |
| Immonium Ion | + | ^ | + |
| -NH3 [fragment -17 Da] | - | * | - |
| -H2O [fragment -18 Da] | - | * | - |
| -CO [fragment-28 Da] | - | * | - |
| -CO2 [fragment -44 Da] | - | * | - |
| Reporter Ion | +,@ | *,@ | *,@ |
(+): considered, -: not considered by doXLINK for respective link-type
(^): computed, not shown in assignment, but used for scoring
(*): computed, shown in assignment, but not used for scoring
(@): the monolink- as well as the crosslink (the same is used for looplink) reporter ion mass can be user-specified in the file xlink.param
The number of assigned mass fragments contributes to the matching score. doXLINK also gives a higher matching score when a fragment annotated as "true" (see above) is supposed to contain the crosslinker. In other words, if peptide cleavage of the molecular species in the computed database leads to a fragment that contains the crosslinker, a pair of fragment peaks (one in the MS/MS spectrum derived from the light crosslinker-modified protein and a corresponding mass shifted peak in the MS/MS spectrum derived from the heavy crosslinker-modified protein) should be observed. Finally, the matching score is higher if the monolink reporter ion is observed when matched to a molecular species in the computed database that contains a monolink. Since the reporter ion for crosslinked species is rarely observed (see main text), doXLINK does not consider this reporter ion in the matching score, but any observed crosslink reporter mass is noted in the doXLINK results.
The matching score given by doXLINK should not be compared to the probApplied Biosystems/MDS Sciexlity score provided by i.e. SEQUEST[2], ProbID[3] or Mascot. It is also recommended that the final assignment of peptide sequence not be based solely on the rank order of the doXLINK matching scores. Peptide sequence assignment is accomplished by using both the doXLINK matching score and manual inspection of the doXLINK result (using the XLinkViewer graphical interface described below).
To execute the doXLINK program, a command line needs to be opened in the folder /doXlink_01 and the command line entry is:
> doXLINK.pl XLINK ../iXLINK_01
where ../iXLINK_01 points to the location of all iXLINK output files.
Note: In the case when iXLINK was used only for data from the sample run in [16O]water) this step is only applicable to the folder PRELIM.
In this example, doXLINK is applied to the folder XLINK created by SeparateSpec. The analogous command line statement is executed to run doXLINK on the other 3 folders created by SeparateSpec (MONO, PAIRLESS and PRELIM). All 4 doXLINK runs can be executed in parallel by opening 4 command windows.doXLINK creates a set of output file, each with the general name SpotPosition_MSMS_PrecursorMass.match.txt.out
An example of such a doXLINK output file is:
pepSeq1 pepSeq2 calcPepMH precursorMH numMatchedIons numTotalIons numMatchPair numTotalPair reporterIonMass matchedReporter Score numTotalPepsInDB Location1 Location2 TQDVSGK#R MONO 1046.54 1045.54 6 16 1 1 0.00,0.00 false 25.77 3 B537 -- AANGK#PGFK MONO 1045.57 1045.54 1 18 0 1 0.00,0.00 false 9.87 3 A81 -- K#K PGFK#QG 1045.60 1045.54 0 23 0 1 0.00,0.00 false 5.30 3 B497 A85 ====================================================== Detailed description of the top matching peptide(s) 70.1 11.8 false (...) 175.2 7.3 false (y)R 230.3 3.2 false (b)TQ 345.4 5.0 false (b)TQD 441.6 2.5 false 585.7 2.3 false 602.8 1.4 false (y)SGK#R 676.1 1.2 false 701.9 11.0 false (y)VSGK#R 747.0 3.3 false 1046.2 100.0 true (y)TQDVSGK#R 1050.3 5.1 false 1051.3 9.0 false ====================================================== Detailed description of the top matching peptide(s) 70.1 11.8 false 72.1 8.3 false (b)A (...) 747.0 3.3 false 1046.2 100.0 true 1050.3 5.1 false 1051.3 9.0 false ====================================================== Detailed description of the top matching peptide(s) 70.1 11.8 false 72.1 8.3 false (...) 701.9 11.0 false K#-FK#QG-44,K#K-GFK#-44 747.0 3.3 false 1046.2 100.0 true 1050.3 5.1 false 1051.3 9.0 false
Each output file contains the highest scoring molecular species matches, up to a maximum of 10 matches. pepSeq1 and pepSeq2 give the peptide sequence for the two peptides crosslinked together (in the case of monolinks only pepSeq1 is given and "MONO" appears under the pepSeq2 column). The symbol "#" appearing in these sequences indicates that the preceding amino acid is covalently modified (either crosslinker reagent-modified internal residue or N-terminus as well as modified cysteine or methione residues, see main text). Peptides are listed vertically in rank order according to matching score, with the top scoring match listed first. The column calcPepMH lists the calculated mass for the MH+ ion derived from the database molecular species. The precursorMH column gives the observed MH+. The numMatchedIons lists the number of matched ion fragments and numTotalIons gives the total number of observed ions. numMatchPair is the number of fragment peaks annotated as "true" (see above) that match to calculated fragments that contain the crosslinker. numTotalPair lists the total number of peaks annotated as "true". reporterIonMass lists the observed crosslinker-derived reporter ion masses. mono_reporter_ion_MH_from_light_precursor and cross_reporter_ion_MH_from_light_precursor indicates whether reporter ions were observed, listing the mass of the reporter ion if it was observed or listing 0.00 if not observed. matchedReporter indicates "true" if the monolink reporter ion is observed for a calculated molecular species that contains a monolink (likewise for crosslinked peptides). Score gives the matching score. numTotalPepsInDB lists the total number of calculated molecular species in the iXLINK-generated database that have a mass that matches the observed precursor mass (within the tolerance limit set by the user). Location1 and Location2 give the location in the peptide sequence of the crosslinker reagent-modified amino acid. The location numbering is based on the residue numbers given in protein_sequence.txt (recall the first residue in a protein chain may not be numbered as "1" in this file, see above).
Also given in the doXLINK output file is detailed information about the fragment ions observed for each of the scored matches. The first column is the observed fragment mass, followed by the normalized fragment intensity (normalized to the base peak), followed by the true/false annotation (see above) followed by the assigned sequence of the fragment. These are listed in rank order based on the matching score with the top scoring match listed first.
doXLINK creates a summary of all peptide assignments in two summary files named sumOUTlong.html and sumOUTlong.xls. They can be viewed with any html browser (we use i.e. Mozilla Firefox or MS Internet Explorer), and MS EXCEL. The contents of these summary files are self-explanatory based on the information provided above.
back to topMonolinks, crosslinks and looplinks tend to show different fragmentation behavior under the same collision conditions in the mass spectrometer (see main text). The number of observed peptide fragments observed with crosslinks and looplinks and their intensities are typically low. Thus, even minor peaks that arise from other molecular species with a similar precursor mass can contribute to a lowering of the doXLINK matching score. It is highly recommended that the matching score notbe the sole basis for peptide sequence assignments. Rather, manual inspection of the matching data should be carried out by the user, and this is facilitated by using XLinkViewer, which gives a graphical display of the results generated by doXLINK. In most cases, we have found that manual inspection of the doXLINK-generated results leads to the same molecular species assignment as that with the top doXLINK matching score.
A typical XLinkViewer display is shown in XLinkViewer.pdf. The PDF file is self-explanatory. For each molecular species, a pull-down menu is used to label each entry as either "correct", "incorrect", "ambiguous" or "unassigned". These labels are updated to a file called Xlink.res. This file does not need to be inspected, but it should be saved so that the user-provided label information is recalled each time XLinkViewer is executed. Thus, the final user-specified assignments can be spread out over multiple viewing sessions.
To run XLinkViewer, double-click on the file XLinkViewer.jar (located in the folder /XLINK_Program_Files/, or double-click on a shortcut to this file on your desktop), then click File >Open and select the file sumOUTlong.html in the doXLINK output folder (i.e. in /doXlink_01) where all *.match.txt.out files are located.
By selecting a *.match.txt.out file in the "Summary" window, the "out file" window will display the top ten scoring peptide candidates for the respective spectrum including doXLINK's peptide assignments below. In the window on the bottom right, the peak list including mass, intensity, and the suggested annotation based on the doXLINK analysis for the selected spectrum is shown. Clicking on any candidate peptide in the "out file" window displays all matched peptide fragments with assignments in the spectrum window below. In the spectrum window, doXLINK fragment assignments are color-coded as follows:
Right-clicking on any selected peptide assignment in the "out file" window allows the user to decide if the assignment is either "correct", "ambiguous" or "incorrect", or to leave it "unassigned". Assignments are written automatically to a file named xlink.res, which is automatically created and then updated with every manual change.
*Spectrum: doXlink output file name: SpotPosition_MSMS_mass.match.txt.out
*Score: doXLINK score
*DeltScore: Delta Score, normalized difference of scores of this assignment and the next assignment. (Assignments of preceding molecular species are automatically sorted by score.)
*TotalPepsInDB: number of molecular species found in peptide database created by iXLINK for masses in specified mass window. (the iXLINK database can be outputted to a file, click here for details)
Some hopefully helpful instructions about how to procede in MS/MS - crosslink assignment for XLinkViewer can be found here.
back to top[1] Seebacher, J., Mallick, P., Zhang, N., Eddes, J. S., Aebersold, R., Gelb, M. H. (2006) Protein Cross-Linking Analysis Using Mass Spectrometry, Isotope-Coded Cross-Linkers, and Integrated Computational Data Processing. J. Proteome Res. manuscript accepted.
[2] Eng,J.K., McCormack,A.L., and Yates III,J.R. (1994) An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. In: J Am Soc Mass Spectrom, 5: 976-989.
[3] Ning Zhang, Ruedi Aebersold, Benno Schwikowski (2002) ProbID: A probApplied Biosystems/MDS Sciexlistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics 2(10): 1406-1412.