XLINK - Manual

Instruction Manual for Crosslinking Analysis Software Package containing iXLINK, doXLINK, and XLinkViewer

Contents
Introduction

System Requirements and Installation

Initial Information Set

MS Data Acquisition

iXLINK

doXLINK

XLinkViewer

Final Results

Introduction

Isotopically labeled crosslinking reagents (bis-NHS esters) and a protein complex of known crystal structure were used to test our new automated computer analysis tools for MALDI mass spec data derived from digested samples of the crosslinked protein complex. The software used for automated analysis contains three programs: iXLINK, a PERL program executed from the command line, doXLINK, a library of java programs executed by a PERL script which is also executed from the command line, and the java program XLinkViewer.

Method outline: Prior to computational analysis, two crosslinked peptide mixtures are independently fractionated by reverse phase chromatography and spotted onto two standard MALDI plates. Each mixture contains peptides modified with either the isotopically heavy or light versions of the crosslinking reagent. Consequently, modified peptides should exist within a given spectra as a doublet - a pair of peaks separated by the mass difference between the heavy and light forms of the label. One of the samples is reacted in buffer containing [16O] water. The other is reacted in buffer containing a mix of [16O] and [18O] water. Within the [16O] / [18O] mixture, monolink peaks will appear as quadruplets whereas crosslink peaks will appear as doublets. Further details of the experimental procedure are given in the original publication. [1]

iXLINK classifies peaks from a duplex LC-MALDI experiment into one of four categories: noise, high abundance, crosslink-derived and monolink-derived. iXLINK generates a database of crosslinker-modified species based on protein sequence(s), crosslinker and protease(s). Once classified, mass mapping is used to generate a preliminary sequence assignment. iXLINK also creates mass inclusion lists for subsequent MS/MS acquisition of potentially crosslinker-modified peptide species.

doXLINK provides matching scores for MS/MS data, that was acquired based on mass inclusionlists created by iXLINK, to peptides included in iXLINK's preliminary sequence asignment.

XLinkViewer is used to visualize iXLINK and doXLINK results, and allows the user to confirm or reject the doXLINK assignments.

back to top
System Requirements and Installation

System Requirements

iXLINK, doXLINK, and XLinkViewer were run on Microsoft Windows XP Version 2002 and require ActivePerl (we used version 5.8, free download from www.activestate.com) and the JavaTM runtime environment (we used JRE Standard Edition, Version 1.5.0, free download from www.java.com) to be installed.

Obtaining the software

The software can be downloaded from http://www.systemsbiology.org , www.proteomecenter.org , or by contacting info@proteomecenter.org.

Installation

The files required to run iXLINK, doXLINK, and XLinkViewer can be found in the folders:
  1. Copy the folder named "XLINK" to your local hard drive, i.e. C:\XLINK.
  2. Create a shortcut to XLinkViewer.jar and copy/move it to the desktop.
  3. [CAUTION: the following steps may adversely effect your system. If you are uncertain, please consult your system administrator.]
    Set the following Environment Variables in Windows XP:
    1. Open Control Panel, click on System, click on the Advanced Tab, and click Environment Variables.
    2. Add "C:\XLINK\UtilityJar\Xlink.jar;C:\XLINK\UtilityJar\XlinkUtil.jar;" to the "CLASSPATH" user variable.
    3. Add "C:\Perl\bin;C:\XLINK\PerlRoutine; C:\Program Files\Java\jre1.5.0_05\bin;" to the "PATH" user variable.
    4. Note: replace java path with your own java path (C:\Program Files\Java\...) if there already exists a previous JRE on your PC.
    If CLASSPATH or PATH do not exist, create them as follows: Under User Variables, click "new".

back to top
Initial Information Set

Figure 1 (click to enlarge).

Proteins: Multiple proteins suspected to be found in the sample and to be modified with crosslinking reagent.

Proteolytic Digestion: The number and type of proteases can be specified. i.e. Trypsin + Asp-N.

Missed Cleavages: The number of missed cleavages can be specified.

Modifications: i.e. Single/double methionine oxidation and cysteine alkylation are allowed.

Crosslinking Reagent: This software can be used with bis-NHS-esters only - (homobifunctional amino-reactive cross-linkers, sensitive to hydrolysis, trypsin won't cleave C-terminal to modified lysine residues). We have used DSS-do/-d12, DSG-d60/-d6, BS2-d0/d4, and BS3-d0/-d4.

LC-MS-Instrumentation: LC fractions of a protease digested crosslinked protein complex are deposited on a MALDI plate. We used a 192 Well Stainless Steel MALDI Sample Plate for sample introduction on the ABI 4700 Proteomics Analyzer. The spotting pattern is shown in Figure 1, starting with spot positions A1, A3... A23, in the top row, proceeding with A2, A4..., A24, through spot position H24, at the bottom right. The file timeConversion.txt assigns sequential number indices to these MALDI plate spots. The indices reflect the order of the LC fractions, i.e. MALDI plate spots A1, A3...A23 are assigned to indices 0, 1... 11, and the next row A2, A4...A24 are assigned to indices 12, 13...23. Thus, the user can change these assignments as desired.

back to top
MS Data Acquisition

MALDI spectra: automatic acquisition with an ABI 4700 Proteomics Analyzer using the 4700 Explorer software. All automatically acquired data is stored in the integrated Oracle database of the program. Mass spectra in tab-delimited text file format are generated using the ABI Data Explorer Software and a macro in Visual Basic, available upon request (MS) and the Peaks-to-Mascot feature in the 4700 Explorer software (MS/MS).

back to top
Conversion of binary MS spectrum files to text files

The ABI 4700 Explorer allows the user to specify the spots on the MALDI plate to be analyzed. For each spot, MS data is obtained after multiple laser shots, and the data is averaged to give a MALDI mass spectrum for each spot. Each spectrum is exported to a binary spectrum file, *.t2d together with a *.cal calibration file. For example, A5_MS_11.t2d is the mass spectrum from spot A5, and A5_MS_11.cal is its respective calibration file. Using the Data Explorer software package provided with the ABI 4700 together with a macro provided by ABI, the *.t2d files are converted into text files, *.txt (for example, see A1_MS_5.txt). A portion of a typical *.txt file is shown below:

MassAreaResolutionS/NRatio
565.054382324219241.1148529052738987.95312527.1505069732666
581.069152832031344.1344604492198790.401367187543.0353393554688
587.0228881835941002.901977539069292.6064453125135.796173095703
609.019836425781141.9495849609388420.247070312516.5039558410645
648.776062011719281.14843754033.8139648437510.8354415893555

The mass column contains the monoisotopic masses, which have been calibrated by the ABI 4700 software using the *.cal files. The area column gives the area of the peak in the mass spectrum after the areas of the peaks from the natural abundance heavy isotopic peaks have been added to the monoisotopic peak area (peak clustering carried out by the ABI Data Explorer software). The last column gives the signal-to-noise ratio. The iXLINK software uses the mass, area, and S/NRatio data extracted from the *.txt files; the Resolution data is not used. iXLINK reads the *.txt file names and extracts the MALDI plate spot number. Thus, the filename should have the same format specified above.

back to top
iXLINK
See Installation Instructions above.

Input for iXLINK

iXLINK requires a work folder including four parameter files and two folders "16" and "18" with the MS data files.

Work Folders: create a new folder, i.e. iXLINK_01 and another two folders, one called 16 and one called 18.

Data Files: Copy all MS data files (*_MS_*.txt) from the LC-MS run using [16O] water into folder 16. Copy all MS data files (*_MS_*.txt) from the [16O]/[18O] water experiment into folder 18.

Paramter Files: Place the following files (click on links 1-4) to view some example files) into folder iXLINK_01:

  1. aa_mass.txt
  2. timeConversion.txt
  3. ProteinSequence.txt
  4. params.pl
Copies of these files can be found in the folder /XLINK_Program_Files/parameters/. They will need to be modified accordingly to the experimental settings before running iXLINK.

aa_mass.txt contains the monoisotopic masses of the amino acid residues included in ProteinSequence.txt (see below).

timeConversion.txt is described above.

ProteinSequence.txt contains the amino acid sequences of all proteins in the sample to be analyzed. The file name may be user-specified. (in the case that the proteins in a sample are not known prior to the iXLINK analysis, please see this note*)
A typical file is shown below:

>A:1
MELKNSISDYTEAEFVQLLKEIEKENVAATDDVLYVLLEHFVKITEHPDGLDLIYYPSDNRDDSPEGIVKEIKEWRAANGKPGFKQG
>B:432 
MRGSHHHHHHGSGSKRNKPGKATGKGKPVNNKWLNNAGKDLGSPVPDRIANKLRDKEFKSFDDFRKKFWEEVSKDPELSKQFSRNNNDRMKVGKAPFTRTQDVSGKRRSFELHHEKPISQNGGVYDMDNISVVTPKRAIDIH
The "A" in line 1 is the user specified name of the first subunit. The "1" in line 1 designates that the first amino acid in the sequence is number 1 (note that the sequence of some proteins may start with a residue number other than 1). The second line gives the protein sequence using single letter amino acid designations. In the example above, a second protein subunit, subunit B, whose first residue is number 432, has also been included.

params.pl gives the user specified parameter values for running iXLINK. The PDF file iXLINK_params.pdf gives a pictorial description of the various user specified parameters within params.pl. (To view PDF files, download Adobe Acrobat Reader from http://www.adobe.com). In addition, most of these parameters are explained in the original publication. [1] Parameters not described in the original publication are given here (comment numbers below refer to the file iXLINK_params.pdf). Some of the parameters in params.pl have no comment information in iXLINK_params.pdf, and these should not be modified.

back to top

To edit params.pl, open the file with a text editor such as Notepad. Be sure to save it under the name params.pl as a text file.

#8 massCalibrationError: If there is a systematic shift in the mass of the peptides in the MALDI MS in one run versus another, the user can specify this mass shift. For example, if the user decides that all masses in the *.txt files in the 18 folder are off by +1 Da, massCalibrationError should be set to 1. iXLINK then substracts 1 Da from all of the MALDI peaks in all *.txt files in the 18 folder. Negative values are also accepted.

#9 heavyLightRTShift: This is the systematic shift in LC retention time caused by starting the MALDI plate spotter at different times after sample injection onto the LC column.

#10 heavyLightRTError: Setting this parameter to 0 implies that each peptide elutes at the same retention time in the two separate LC runs. We used a value of 20 spots. In this case, if a doublet split by 12 Da (d0- and d12-DSS crosslinker attached) is observed in LC fraction 27 in the [16O]water sample data set, iXLINK will look for the corresponding peptide in spots 7-47 in LC run 2 ([16O]/[18O]water run).

#11 IdentificationMassError: This is the difference in mass between the observed mass for the query peptide and the mass of the peptide in the database of calculated peptide masses. This is used by iXLINK for mass mapping.

#12 numberOFEnzymes: This is the number of different proteases used plus 2. For example, if the protein sample is digested with both trypsin and AspN, this parameter is set to 4.

#13 minimumDigestMass and maximumDigestMass: After calculation of the masses for all monolink modified peptides, iXLINK keeps only those that fall in the range of miminumDigestMass to maximumDigestMass Da. We chose 4000 for maximumDigestMass because the MALDI instrument was setup to scan up to 4000 Da.

#14 minimumXlinkMass and maximumXlinkMass: Same as above except for calculated masses of crosslinked peptides.

#15 cysMod: The mass of the cysteine modification, i.e. 57.02 for modification with iodoacetamide.

The parameter values given in iXLINK_params.pdf are those used in our original study of the Dnase/Im7 heterodimer treated with d0/d12-DSS crosslinker and digested with trypsin and endoproteinase Asp-N.

The directory structure should now look like this:

back to top

To run the PERL-script, open the command window on the PC and use the cd command to go to the iXLINK_01 folder. On the command line type:

> perl iXLINK.pl ProteinSequence.txt ../16 ../18
where ProteinSequence.txt is the name of the protein sequence text file. Note, "../16" and "../18" designate the folders containing the MS data files, located one directory path up, relative to the current directory.

If the command line also contains a "1" at the end:

> perl iXLINK.pl ProteinSequence.txt ../16 ../18 1

iXLINK will also write an output file i.e. ProteinSequence.txt.db.txt, the iXLINK database, that gives the masses of all calculated monolink and crosslink species along with the identify of the peptides. (note: this might be a large file)

Some representative rows in this file are shown below:

1      2  3  4                               5                               6        7    8    9    10   11   12
-----  -  -  ------------------------------  ------------------------------  -------  ---  ---  ---  ---  ---  ---
XLINK  A  B  n#M*ELK                         nM#RGSHHHHHHGSGSK#RNKPGKATGKGK  3616.79    0    4  431  458    0  446
XLINK  A  B  n#M*ELK                         nM#RGSHHHHHHGSGSKRNKPGK#ATGKGK  3616.79    0    4  431  458    0  452
XLINK  A  B  n#M#ELK                         nM*RGSHHHHHHGSGSKRNKPGK#ATGKGK  3616.79    0    4  431  458    0  452
XLINK  A  B  n#M*ELK                         nM#RGSHHHHHHGSGSKRNKPGKATGK#GK  3616.79    0    4  431  458    0  456
XLINK  A  B  n#M#ELK                         nM*RGSHHHHHHGSGSKRNKPGKATGK#GK  3616.79    0    4  431  458    0  456
XLINK  A  B  n#M#ELK                         nM*RGSHHHHHHGSGSK#RNKPGKATGKGK  3616.79    0    4  431  458    0  446
XLINK  A  B  n#M#ELK                         n#M*RGSHHHHHHGSGSKRNKPGKATGKGK  3616.79    0    4  431  458    0  431
XLINK  A  B  n#M*ELK                         nM#RGSHHHHHHGSGSKRNK#PGKATGKGK  3616.79    0    4  431  458    0  449
XLINK  A  B  n#M#ELK                         nM*RGSHHHHHHGSGSKRNK#PGKATGKGK  3616.79    0    4  431  458    0  449
XLINK  B  A  n#M#RGSHHHHHHGSGSKRNKPGKATGKGK  n#M*ELK                         3616.79  431  458    0    4  431    0
MONO   B  -  RNK#PGK                         -                                855.51  447  452    -    -  449    -
LOOP   B  B  ATGKGK#PVNNK#WLNNAGK            ATGKGK#PVNNK#WLNNAGK            2035.11  453  470  453  470  458  463
LOOP   B  B  ATGK#GK#PVNNKWLNNAGK            ATGK#GK#PVNNKWLNNAGK            2035.11  453  470  453  470  456  458
LOOP   B  B  ATGK#GKPVNNK#WLNNAGK            ATGK#GKPVNNK#WLNNAGK            2035.11  453  470  453  470  456  463

Such a *.db.txt database file consists of 12 columns. Each row represents a single entry of the computed database of crosslinker-modified peptides. The 1st column indicates the type of modification which is either crosslink (XLINK), monolink (MONO) or looplink (LOOP). The letters "A" or "B" in columns 2 and 3 refer to the location (the subunit from which a peptide is derived, according to the file ProteinSequence.txt) of the two peptides listed in columns 4 and 5 in single letter-code that are interconnected to form a crosslinked species. A "-" in columns 3 and 5 is only found in the case of monolinks, where no second peptide is part of in the molecule. In the case of looplinks, the peptide sequence shown in column 5 is simply a copy of the entry in column 3. "n#" and "K#" in the sequence of such peptides in columns 4 and 5 indicate the potentially modified amino acid residues (here amino termini "n" and lysines "K") that are attached to one end of the crosslinking reagent. "n#" is therefore a cross-linker-modified N-terminus and K# is a crosslinker-modified lysine. # and * symbolize either single (#) or double (*) modification of selected residues (i.e. here: singly or doubly oxidized methionines, M# and M*, C# for carbamidomethylated cysteine). Column 6 is the calculated monoisotopic mass of the molecular species. Columns 7 and 8 are the start- and end-residue number of peptide 1 (see column 4), and likewise are columns 9 and 10 for peptide 2 (see column 5), all this according to the specified start residue numbers in ProteinSequence.txt (note, that a "0" in columns 7 and 9 indicates that the cross-linker-modified alpha-amino group of the n-terminus (here residue # 1) is counted as residue 0 by iXLINK. Columns 11 and 12 are the positions of the modified amino acid residues of peptides 1 and 2 (see columns 4 and 5), relative to the respective residue # of the N-terminus.

For example, in row 1 in the table above, the molecular species is a crosslink formed by the two peptides MELK derived from subunit A and MRGSHHHHHHGSGSKRNKPGKATGKG derived from subunit B. The first peptide starts with the n-terminus at methionine residue #1 and ends with lysine residue #4. The second peptide starts with the n-terminus at methionine residue #431 and ends with lysine residue #458. Peptide 1 is connected via its n-terminus, residue #0, to lysine residue #449 in peptide 2. The monoisotopic mass of this species is 3616.79 Da.

If the user only has a single LC/MALDI run to be analyzed ([16O] water experiment), the command line entry is:

> iXLINK.pl ProteinSequence.txt ../16 -
or
> iXLINK.pl ProteinSequence.txt ../16 - 1
if a peptide database file should also be created.

In this case, iXLINK can only distinguish crosslinker-modified peptides from non-modified peptides, i.e. it cannot resolve modified peptides into monolink versus crosslink species. This is useful in cases where only the MS/MS data is used to distinguish crosslinks from monolinks.

back to top
iXLINK Results

After running iXLINK the following output files are created:
  • 16.18.monolinks.seqIDs.txt
    This file has the sequences of those peptides that are likely to contain a monolink based on the occurrence of the oxygen-18 splitting signature (see main text for an explanation). Note, for each monolink species found in the MALDI mass spectrum, there is often more than one peptide in the computed database that has as computed mass that is within the tolerance limit of the observed mass. In this case, all peptide candidates are listed.
  • 16.18.pairless.seqIDs.txt
    This file has the sequence of those peptides that show the peak splitting due to the light and heavy crosslinker (i.e. 12 Da in the case of d0/d12-DSS) but were not found in the second LC/MALDI run for the [16O]/[18O] water sample. All peptide candidates are listed.
  • 16.18.tuplets.txt
  • 16.18.xlinks.seqIDs.txt
    This file has the sequences of those peptides that are likely to contain a crosslink based on the lack of occurrence of the oxygen-18 splitting mass signature (see main text for an explanation). All candidate peptides are listed.
  • 16.prelim.seqIDs.txt
    If only the [16O]water LC/MALDI run is carried out, this file has the peptide sequences of all candidate crosslinker-modified peptides (monolinks + crosslinks).
  • 16.lightFiles.EverythingPossible.InclusionList.txt
    This file is the inclusion list of all light/heavy mass pairs for the [16O]water sample along with the MALDI plate spot name. This list includes all candidate monolinks and crosslinks (i.e. all peptides that show splitting due to the light and heavy crosslinker pair). Peptides in this list should be further analyzed by MS/MS for subsequent analysis using doXLINK.
  • 16.lightFiles.prelim.InclusionList.txt
    Same as "16.lightFiles.EverythingPossible.InclusionList.txt" except only the light members of the mass pairs is listed.
  • 16.lightFiles.StrongSinglets.InclusionList.txt
  • 16.light.mono.InclusionList.txt
    This file gives the mass and MALDI plate spot names for the light members of the mass pairs for all monolink candidates found in the data from the [16O]water sample.
  • 16.light.pairless.InclusionList.txt
    This file gives the mass and MALDI plate spot names for the light members of the mass pairs for peptides found in the [16O]water sample but not in the [18O]water sample.
  • 16.light.StrongMatchedSinglets.InclusionList.txt
    Same as "16.lightFiles.StrongSinglets.InclusionList.txt" but only for those MALDI MS peaks that are found in both the [16O]water and [18O]water samples.
  • 16.light.xlink.InclusionList.txt
    Same as "16.light.mono.InclusionList.txt" but for candidate crosslink peptides.
  • 18.heavyFiles.EverythingPossible.InclusionList.txt
  • 18.heavyFiles.prelim.InclusionList.txt
  • 18.heavyFiles.StrongSinglets.InclusionList.txt
  • 18.heavy.mono.InclusionList.txt
  • 18.heavy.StrongMatchedSinglets.InclusionList.txt
    This file gives the mass values of all MS peaks that satisfy the lonelyPeaksThreshold and lonelyPeaksSNRThreshold cutoff criteria (see iXLINK_params.pdf for an explanation of these threshold parameters) that are found in the [16O] and the [18O]water sample. Note, MS peaks that show the splitting due to the light and heavy crosslinkers are not included in this list. Peptides in this list may be further analyzed by MS/MS for subsequent protein identification* (see also this note)
  • 18.heavy.xlink.InclusionList.txt
  • ProteinSequence.txt.db.txt
    Computed database of all crosslinker modified peptides. (Recall, this database text file will be created only if the user adds a "1" to the command line when executing iXLINK (as explained above).
  • back to top

    Submission of list of light/heavy mass pairs as potential candidates for crosslinker-modified peptides to an MS/MS experiment

    This step requires the user to submit precursor masses of candidate crosslinker-modified peptides on the MALDI plate to MS/MS acquisition. A list of precursor masses of interest can be found in the iXLINK output file "16.lightFiles.EverythingPossible.InclusionList.txt". This file should be located in the folder /iXLINK_01. It contains pairs of masses (column 2) identified by iXLINK as precursor masses for candidate crosslinker-modified peptides together with their respective positions on the MALDI target plate (column 1) and their respective peak intensities (column 3). An extract of such an inclusion list file is shown below (click here to view a complete example file).

    F4 2099.0071 2538.2166
    F4 2111.0886 1652.9204
    F4 2120.9661 242.8172
    F4 2133.0583 167.1393
    F4 2153.0437 13.8308
    F4 2165.0166 14.4907
    

    For example, in MALDI plate spot F4 there are multiple mass pairs found at 2099.0/2111.1 Da (rows 1+2), 2120.9/2133.0 Da (rows 3+4), and 2153.0/2165.0 Da (rows 5+6) with a common mass difference of ~12 Daltons, as expected for a 1:1 mixture of d0/d12-DSS modified peptides.

    This inclusion list is submitted to a preferably automated series of MS/MS acquisitions. We used the ABI 4700 Explorer software and created these series using the Spot Set Manager tool. Therefore precursor masses were copied from column 2 in the file "16.lightFiles.EverythingPossible.InclusionList.txt" and pasted into the respective column in the Spot Set Manager (we used MS Excel to copy/paste). For every precursor mass entry the respective spot position has to be selected manually from a pull-down menu. We hope that in future releases of the ABI 4700 Explorer software there will be an easier way to do this (i.e. files with a similar format to "16.lightFiles.EverythingPossible.InclusionList.txt" could be uploaded directly into the Spot set Manager, or spot positions on a MALDI plate could be pasted along with precursor masses in the Spot Set Manager tool).

    Once acquired, all MS/MS spectra are submitted to doXLINK analysis (see below).

    *Note, in some cases, the identities of all the proteins in the sample will not be known. The user may wish to submit this sample to the crosslinking reaction followed by proteolysis and LC/MALDI analysis. iXLINK can still be executed, but in this case a version of the file ProteinSequence.txt lacking protein sequences can be used (click here to see an example of such a sequenceless file). Also, if only a subset of the protein identities is known, the file ProteinSequences.txt should contain only the known protein sequences. iXLINK will run successfully and output files containing masses and MALDI plate spot names of candidate monolink and crosslink peptides (see above for output file descriptions). The user can examine the iXLINK output file "18.heavy.StrongMatchedSinglets.InclusionList.txt" to obtain a list of strong singlet peptides that can be submitted to MS/MS analysis for the purpose of protein identification using suitable software (i.e. with SEQUEST[2], ProbID[3] or Mascot). Recall that iXLINK outputs lists of all strong mass singlets that are likely to be caused by isotopically unlabelled peptides found in the [16O]water sample (see "16.lightFiles.StrongSinglets.InclusionList.txt"), the [18O]water sample (see "18.heavyFiles.StrongSinglets.InclusionList.txt" above) and in both runs ("16.light.StrongMatchedSinglets.InclusionList.txt" and "18.heavy.StrongMatchedSinglets.InclusionList.txt"). If the same strong singlet is seen in both the [16O]water and [18O]water samples, these may be the most reliable peaks for subsequent MS/MS analysis for protein identification. For the latter step, we recommend using the MALDI plate from the [18O]water sample and "18.heavy.StrongMatchedSinglets.InclusionList.txt", and to reserve the plate containing the [16O]water sample for subsequent MS/MS data collection for candidate crosslinks.
    back to top
    doXLINK

    The program doXLINK consists of two Java Archive (JAR) files, Xlink.jar and XlinkUtil.jar, in which multiple files are bundled into single archive files. They are executed by the PERL-script doXlink.pl. doXLINK has been run on a PC running Microsoft Windows XP Version 2002. To run doXLINK, you will need to install a Perl compiler and JavaTM. See Installation Instructions above.

    Input for doXLINK

    doXLINK requires two parameter files, findpair.param and xlink.param and all MS/MS spectra that need to be analyzed to be in one folder.

    First create a folder named doXLINK_01 and place the parameter files findpair.param, xlink.param, and all MS/MS spectrum files there. Find example files for findpair.param and xlink.param in the folder /XLINK_Program_Files/parameters/. The content of these files needs to be changed according to the crosslinking experiment (remove respective comments or parameters). All MS/MS spectrum files are required to be in the form of single text files, one for each MS/MS run (see below). The file format of these MS/MS spectrum files required for doXLINK is as shown below for the file B4_MSMS_762.txt. It may be noted that this file format is commonly used to conduct Mascot searches. "B4" in the filename designates the MALDI plate spot name and "762" designates the integer portion of the precursor mass. The filenames need to be in the format shown since doXLINK will extract information from the filenames.

    BEGIN IONS
    TITLE=B4_MSMS_762
    PEPMASS=762.4135
    70.1159133911133 1244.08898925781
    84.1473999023438 1373.8359375
    87.137580871582  704.233154296875
    234.198471069336 1885.90246582031
    343.348876953125 771.76611328125
    467.316528320313 391.350616455078
    542.721008300781 778.282470703125
    554.531372070313 2608.8828125
    556.658081054688 945.875
    576.517272949219 1593.81921386719
    606.706481933594 4522.99462890625
    608.71826171875  310.455993652344
    671.855834960938 302.464660644531
    698.934509277344 9999.92578125
    701.32470703125  227.284851074219
    702.177001953125 497.837890625
    720.993591308594 526.599487304688
    762.865478515625 6645.123046875
    764.884338378906 800.627197265625
    765.637817382813 695.7568359375
    END IONS
    

    In the above text file, the first column is the observed mass and the second column is the isotope cluster area. The second line of the file gives the MALDI plate spot name (B4) and the integer portion of the precursor mass.

    The MS/MS data analysis with doXLINK requires three steps (which are explained in detail below):
    1. MS/MS-data export- creating a set of MS/MS spectrum files in text file format (Mascot generic file format)
    2. SeparateSpec- preliminary analysis of spectrum files dependent on the crosslinker's isotope label, and sorting into folders based on iXLINK's selection for MONO, XLINK, PAIRLESS, or PRELIM
    3. doXLINK analysis- matching of MS/MS spectrum files against theoretical peptide fragments derived from iXINK's peptide databases.
    back to top
    MS/MS-data export

    We created MS/MS spectrum files in two ways. The latter one is more convenient.

    1. Like the MS spectrum files used for iXLINK, all spectra are exported to binary t2d files from the 4700 Explorer software, and converted into text files with the Data Explorer software using the Visual Basics macro referred to above (see /XLINK_Program_Files/macro.txt). An example of such a file from a single MS/MS run is described above. These new text files are then used with doXLINK.
    2. A series of MS/MS spectra that was acquired automatically with the ABI 4700 Explorer software can be exported using the function "Peaks to Mascot", click here to see a screenshot. Within this function, a mass range, excluded masses, signal-to-noise threshold, peak density, and maximum number of peaks per precursor can be specified by the user and automatically applied to all spectra.
    This allows the creation of a single text file that consists of a concatenated series of single MS/MS spectra. An extract of one of these files is shown below (note that (...) designates data that has been omitted for the purposes of illustration).

    COM=Project: User Project 1\Jan, Spot Set: User Project 1\Jan\JS barcode 5
    BEGIN IONS
    PEPMASS=634.3389
    CHARGE=1+
    TITLE=Label: C11, Spot_Id: 218906, Peak_List_Id: 615685, MSMS Job_Run_Id: 23127, Comment: 
    72.031227     340.87158
    (...)
    591.61432     389.84393
    END IONS
    BEGIN IONS
    PEPMASS=1743.9498
    CHARGE=1+
    TITLE=Label: A17, Spot_Id: 218864, Peak_List_Id: 615378, MSMS Job_Run_Id: 23127, Comment: 
    110.17418     2824.3665
    (...)
    1703.6367     344.05386
    END IONS
    (...)
    

    In this case, the combined file of MS/MS spectra needs to be converted to individual MS/MS spectrum files (same format as specified above) prior to executing doXLINK. To convert such a text file (i.e. file.txt) to the respective multiple single spectrum files (including filenames in the required file name format) the script Pkl2txt can be used. Therefore a command window at the location of the text file needs to be opened, and the command line entry will be:

    > java Pkl2txt filename.txt
    
    It is recommended to save a copy of all spectrum files in text format to a location different from the one used in the next step (SeparateSpec) because they will be modified and overwritten.

    back to top

    SeparateSpec

    The next step of the analysis is to run the program SeparateSpec. This script is used to assign all MS/MS spectra file contained in the folder doXLINK_01 pairwise to the respective inclusion lists ("prelim", "mono", "xlink", and/or "pairless", see iXLINK output files). The parameters given in findpair.param need to be chosen appropriately, and may eventually need to be adjusted to a given data set until all/most of the MS/MS spectra files are assigned, i.e. sorted into respective folders. Open the folder doXlink_01 and then open a command window and type:

    > java SeparateSpec ../iXLINK_01
    

    (Note that "../iXLINK_01" specifies the location of respective iXLINK output files.)

    For each peptide pair (where the precursor mass difference corresponds to the difference in mass between the light and heavy crosslinker reagent, i.e. 12 Da in the case of d0/d12-DSS), SeparateSpec chooses the appropriate MS/MS spectrum files corresponding to each peptide pair and looks for fragment ion masses that differ by the mass difference between light and heavy crosslinker reagent. SeparateSpec only outputs the MS/MS spectrum of the peptide modified with the light crosslinker reagent. If this spectrum contains a fragment ion peak that is also found in the MS/MS spectrum of the corresponding heavy crosslinker-modified peptide (but shifted by 12 Da in the case of d0/d12-DSS), it annotates this peak as "true". For the "true" annotation, SeparateSpec also requires that the intensities of the light and heavy ion fragment peaks differ by less than 20% (actually the intensities normalized to the base peak of each MS/MS spectrum are used). If SeparateSpec does not find the corresponding 12 Da-shifted heavy fragment ion mass, it annotates the peak as "false". Peaks found in the MS/MS spectrum of the heavy crosslinker-modified peptide for which there is no corresponding peak in the MS/MS spectrum of the light crosslinker-modified peptide are not saved. In this way, SeparateSpec creates new MS/MS spectrum files for the light crosslinker-modified peptide that also contain the true/false annotations described above. These spectrum files are outputted into 4 new folders called MONO, XLINK, PAIRLESS and PRELIM. The MONO folder contains MS/MS spectrum files of peptides with precursor masses that are found in the inclusion list 16.light.mono.InclusionList.txt. Likewise, the XLINK folder contains MS/MS spectrum files of peptides with precursor masses that are found in the inclusion list 16.light.xlink.InclusionList.txt, and the PAIRLESS folder contains MS/MS spectrum files of peptides with precursor masses that are found in the inclusion list 16.light.pairless.InclusionList.txt, and the PRELIM folder contains MS/MS spectrum files of peptides with precursor masses that are found in the inclusion list 16.lightFiles.prelim.InclusionList.txt (see iXLINK output files).

    SeparateSpec creates only a single folder, PRELIM, in the case where there is only data used from the sample run in [16O]water (no data from the [18O]water sample used). The MS/MS spectrum filenames are changed by SeparateSpec from *.txt to *.match.txt. A portion of one of these *.match.txt files is shown below:

    (...)
    1169.5 24.2   false
    1226.5 15.7   false
    1255.6 50.4   true
    1485.8 49.5   true
    (...)
    
    The first column gives the fragment ion mass, the second is the peak intensity normalized to the base peak, and the third column gives the true/false annotation described above. The MS/MS spectrum files derived from heavy crosslinker-modified precursor ions are no longer present in /doXlink_01. Of course all of the original MALDI MS and MS/MS spectrum files should be available on the computer controlling the MALDI mass spectrometer and can be backed up to a CD or appropriate data storage device.

    back to top

    doXLINK analysis

    This is the major step in the analysis of MS/MS spectra of crosslinked peptides. It is applied separately to every folder created by SeparateSpec (MONO, XLINK, PAIRLESS, and PRELIM, see above). It requires the parameter file xlink.param.

    Note: In the case when iXLINK was used only for data from the sample run in [16O]water) this step is only applicable to the folder PRELIM.

    The program does the following:

    Based on the folder into which a mass spectrum was sorted using SeparateSpec, (MONO, XLINK, PAIRLESS, or PRELIM) and the precursor mass (contained in the MS/MS spectrum file name), peaks in every mass spectrum are matched against all computed fragments of all molecular species included in the respective peptide database created by iXLINK (16.18.monolinks.seqIDs.txt, 16.18.pairless.seqIDs.txt, 16.18.xlinks.seqIDs.txt or 16.prelim.seqIDs.txt, click here to see a list of all iXLINK output files). The selection of molecular species from the computed database to be matched against in doXLINK for every spectrum file is based on a mass window specified by the parameter precursor_mass_tolerance in the file xlink.param. All fragment mass peaks (from the MS/MS spectra files) are matched against computed peptide fragment masses within a mass window specified by the parameter fragment_mass_tolerance. Fragments containing the crosslinker are computed using the parameter xlink_group_mass_from_light (this mass is derived from the mass added to a peptide if modified with a light crosslinker molecule but hydrolyzed at one end in the case of monolinks, e.g. 256.08 for DSS). The parameters mono_reporter_ion_MH_from_light_precursor and cross_reporter_ion_MH_from_light_precursor specify expected reporter ion masses for either monolinks or crosslinks (we observed two potential reporter ions for DSS-modified peptides: 222.1 (crosslink) and 240.2 (monolink). Details about the molecular structure of these reporter ions can be found in the original publication. [1]

    All matched peptide candidates are given a score based on the agreement of respective MS/MS spectra with all theoretical peptide fragments (see Table 1 below) calculated from the peptide databases *seqIDs.txt.

    Table 1: MS/MS fragment ion assignments used by doXLINK

    Peptide Fragment Monolink Crosslink Looplink
    b-Ion + + +
    y-Ion + + +
    Immonium Ion + ^ +
    -NH3 [fragment -17 Da] - * -
    -H2O [fragment -18 Da] - * -
    -CO [fragment-28 Da] - * -
    -CO2 [fragment -44 Da] - * -
    Reporter Ion +,@ *,@ *,@

    (+): considered, -: not considered by doXLINK for respective link-type
    (^): computed, not shown in assignment, but used for scoring
    (*): computed, shown in assignment, but not used for scoring
    (@): the monolink- as well as the crosslink (the same is used for looplink) reporter ion mass can be user-specified in the file xlink.param

    The number of assigned mass fragments contributes to the matching score. doXLINK also gives a higher matching score when a fragment annotated as "true" (see above) is supposed to contain the crosslinker. In other words, if peptide cleavage of the molecular species in the computed database leads to a fragment that contains the crosslinker, a pair of fragment peaks (one in the MS/MS spectrum derived from the light crosslinker-modified protein and a corresponding mass shifted peak in the MS/MS spectrum derived from the heavy crosslinker-modified protein) should be observed. Finally, the matching score is higher if the monolink reporter ion is observed when matched to a molecular species in the computed database that contains a monolink. Since the reporter ion for crosslinked species is rarely observed (see main text), doXLINK does not consider this reporter ion in the matching score, but any observed crosslink reporter mass is noted in the doXLINK results.

    The matching score given by doXLINK should not be compared to the probability score provided by i.e. SEQUEST[2], ProbID[3] or Mascot. It is also recommended that the final assignment of peptide sequence not be based solely on the rank order of the doXLINK matching scores. Peptide sequence assignment is accomplished by using both the doXLINK matching score and manual inspection of the doXLINK result (using the XLinkViewer graphical interface described below).

    To execute the doXLINK program, a command line needs to be opened in the folder /doXlink_01 and the command line entry is:

    > doXLINK.pl XLINK ../iXLINK_01
    

    where ../iXLINK_01 points to the location of all iXLINK output files.

    Note: In the case when iXLINK was used only for data from the sample run in [16O]water) this step is only applicable to the folder PRELIM.

    In this example, doXLINK is applied to the folder XLINK created by SeparateSpec. The analogous command line statement is executed to run doXLINK on the other 3 folders created by SeparateSpec (MONO, PAIRLESS and PRELIM). All 4 doXLINK runs can be executed in parallel by opening 4 command windows.

    back to top

    doXLINK Results

    doXLINK creates a set of output file, each with the general name SpotPosition_MSMS_PrecursorMass.match.txt.out

    An example of such a doXLINK output file is:

    pepSeq1    pepSeq2 calcPepMH precursorMH numMatchedIons numTotalIons numMatchPair numTotalPair reporterIonMass matchedReporter Score numTotalPepsInDB Location1 Location2
    TQDVSGK#R  MONO    1046.54   1045.54     6              16           1            1            0.00,0.00       false           25.77 3                B537      --
    AANGK#PGFK MONO    1045.57   1045.54     1              18           0            1            0.00,0.00       false           9.87  3                A81       --
    K#K        PGFK#QG 1045.60   1045.54     0              23           0            1            0.00,0.00       false           5.30  3                B497      A85
    
    ======================================================
    
    Detailed description of the top matching peptide(s)
    
    70.1   11.8   false  
    (...)
    175.2  7.3    false  (y)R
    230.3  3.2    false  (b)TQ
    345.4  5.0    false  (b)TQD
    441.6  2.5    false  
    585.7  2.3    false  
    602.8  1.4    false  (y)SGK#R
    676.1  1.2    false  
    701.9  11.0   false  (y)VSGK#R
    747.0  3.3    false  
    1046.2 100.0  true   (y)TQDVSGK#R
    1050.3 5.1    false  
    1051.3 9.0    false  
    
    ======================================================
    
    Detailed description of the top matching peptide(s)
    
    70.1   11.8   false  
    72.1   8.3    false  (b)A
    (...)
    747.0  3.3    false  
    1046.2 100.0  true   
    1050.3 5.1    false  
    1051.3 9.0    false  
    
    ======================================================
    
    Detailed description of the top matching peptide(s)
    
    70.1   11.8   false  
    72.1   8.3    false  
    (...)
    701.9  11.0   false  K#-FK#QG-44,K#K-GFK#-44
    747.0  3.3    false  
    1046.2 100.0  true   
    1050.3 5.1    false  
    1051.3 9.0    false  
    

    Each output file contains the highest scoring molecular species matches, up to a maximum of 10 matches. pepSeq1 and pepSeq2 give the peptide sequence for the two peptides crosslinked together (in the case of monolinks only pepSeq1 is given and "MONO" appears under the pepSeq2 column). The symbol "#" appearing in these sequences indicates that the preceding amino acid is covalently modified (either crosslinker reagent-modified internal residue or N-terminus as well as modified cysteine or methione residues, see main text). Peptides are listed vertically in rank order according to matching score, with the top scoring match listed first. The column calcPepMH lists the calculated mass for the MH+ ion derived from the database molecular species. The precursorMH column gives the observed MH+. The numMatchedIons lists the number of matched ion fragments and numTotalIons gives the total number of observed ions. numMatchPair is the number of fragment peaks annotated as "true" (see above) that match to calculated fragments that contain the crosslinker. numTotalPair lists the total number of peaks annotated as "true". reporterIonMass lists the observed crosslinker-derived reporter ion masses. mono_reporter_ion_MH_from_light_precursor and cross_reporter_ion_MH_from_light_precursor indicates whether reporter ions were observed, listing the mass of the reporter ion if it was observed or listing 0.00 if not observed. matchedReporter indicates "true" if the monolink reporter ion is observed for a calculated molecular species that contains a monolink (likewise for crosslinked peptides). Score gives the matching score. numTotalPepsInDB lists the total number of calculated molecular species in the iXLINK-generated database that have a mass that matches the observed precursor mass (within the tolerance limit set by the user). Location1 and Location2 give the location in the peptide sequence of the crosslinker reagent-modified amino acid. The location numbering is based on the residue numbers given in protein_sequence.txt (recall the first residue in a protein chain may not be numbered as "1" in this file, see above).

    Also given in the doXLINK output file is detailed information about the fragment ions observed for each of the scored matches. The first column is the observed fragment mass, followed by the normalized fragment intensity (normalized to the base peak), followed by the true/false annotation (see above) followed by the assigned sequence of the fragment. These are listed in rank order based on the matching score with the top scoring match listed first.

    doXLINK creates a summary of all peptide assignments in two summary files named sumOUTlong.html and sumOUTlong.xls. They can be viewed with any html browser (we use i.e. Mozilla Firefox or MS Internet Explorer), and MS EXCEL. The contents of these summary files are self-explanatory based on the information provided above.

    back to top
    XLinkViewer

    Monolinks, crosslinks and looplinks tend to show different fragmentation behavior under the same collision conditions in the mass spectrometer (see main text). The number of observed peptide fragments observed with crosslinks and looplinks and their intensities are typically low. Thus, even minor peaks that arise from other molecular species with a similar precursor mass can contribute to a lowering of the doXLINK matching score. It is highly recommended that the matching score notbe the sole basis for peptide sequence assignments. Rather, manual inspection of the matching data should be carried out by the user, and this is facilitated by using XLinkViewer, which gives a graphical display of the results generated by doXLINK. In most cases, we have found that manual inspection of the doXLINK-generated results leads to the same molecular species assignment as that with the top doXLINK matching score.

    A typical XLinkViewer display is shown in XLinkViewer.pdf. The PDF file is self-explanatory. For each molecular species, a pull-down menu is used to label each entry as either "correct", "incorrect", "ambiguous" or "unassigned". These labels are updated to a file called Xlink.res. This file does not need to be inspected, but it should be saved so that the user-provided label information is recalled each time XLinkViewer is executed. Thus, the final user-specified assignments can be spread out over multiple viewing sessions.

    To run XLinkViewer, double-click on the file XLinkViewer.jar (located in the folder /XLINK_Program_Files/, or double-click on a shortcut to this file on your desktop), then click File >Open and select the file sumOUTlong.html in the doXLINK output folder (i.e. in /doXlink_01) where all *.match.txt.out files are located.

    By selecting a *.match.txt.out file in the "Summary" window, the "out file" window will display the top ten scoring peptide candidates for the respective spectrum including doXLINK's peptide assignments below. In the window on the bottom right, the peak list including mass, intensity, and the suggested annotation based on the doXLINK analysis for the selected spectrum is shown. Clicking on any candidate peptide in the "out file" window displays all matched peptide fragments with assignments in the spectrum window below. In the spectrum window, doXLINK fragment assignments are color-coded as follows:

    Right-clicking on any selected peptide assignment in the "out file" window allows the user to decide if the assignment is either "correct", "ambiguous" or "incorrect", or to leave it "unassigned". Assignments are written automatically to a file named xlink.res, which is automatically created and then updated with every manual change.

    back to top

    Final Results

    After having manually evaluated the assignments for all MS/MS spectra with XLinkViewer, a final result file named "report.xls" can be exported by clicking File >Simple Report. report.xls can be viewed with i.e. MS Excel or any text editor. It contains the information about all manually selected MS/MS spectra with their respective analysis information:

    *Spectrum: doXlink output file name: SpotPosition_MSMS_mass.match.txt.out

    *Score: doXLINK score

    *DeltScore: Delta Score, normalized difference of scores of this assignment and the next assignment. (Assignments of preceding molecular species are automatically sorted by score.)

    *TotalPepsInDB: number of molecular species found in peptide database created by iXLINK for masses in specified mass window. (the iXLINK database can be outputted to a file, click here for details)

    Some hopefully helpful instructions about how to procede in MS/MS - crosslink assignment for XLinkViewer can be found here.

    back to top
    References

    [1] Seebacher, J., Mallick, P., Zhang, N., Eddes, J. S., Aebersold, R., Gelb, M. H. (2006) Protein Cross-Linking Analysis Using Mass Spectrometry, Isotope-Coded Cross-Linkers, and Integrated Computational Data Processing. J. Proteome Res. manuscript accepted.

    [2] Eng,J.K., McCormack,A.L., and Yates III,J.R. (1994) An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. In: J Am Soc Mass Spectrom, 5: 976-989.

    [3] Ning Zhang, Ruedi Aebersold, Benno Schwikowski (2002) ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics 2(10): 1406-1412.