TPP SEWW Demo2012
From SPCTools
Simple Executable Workflow Webapp Setup and Demo
Setting up SEWW
1. Download and install the TPP
To install on your Windows system as a localhost, please follow our TPP Windows Installation Guide, making sure that you select to download the latest version of TPP from our Sourceforge download site (111Mb).
As a way to verify that the installation was successful, log into Petunia by double-clicking on the Trans-Proteomic Pipeline flower icon on your Desktop or through the Start menu. Alternatively, you can open a browser window into the following URL: http://localhost/tpp-bin/tpp_gui.pl . You can use the credentials guest and guest as user name and password to log in.
Make sure that the WEBSERVER_ROOT environment variable is set. From the start menu check Control Panel.System.Advanced system settings.Environment Variables.... If you chose the default location for Petunia, WEBSERVER_ROOT should be c:/Inetpub/wwwroot
2. Install Jetty and SEWW
To install on your Windows systems, please go to Sourceforge download site and from the SEWW folder select jetty-distribution-7.3.0.v20110203-SEWW-0.5.zip for download (36Mb) into the directory you wish to install Jetty (c:\Jetty would be the standard directory). Right click on the zip file and select extract all. This will create the subdirectory jetty-distribution-7.3.0.v20110203. Now create the JETTY_HOME environment variable by going to Control Panel.System.Advanced system settings.Environment Variables... from the start menu. In the Systems variables field click on New ... and set Variable name: to 'JETTY_HOME' and Variable value: to the directory Jetty was installed to (c:\Jetty\jetty-distribution-7.3.0.v20110203 by default).
The Jetty distribution in sashimi has been updated to contain the necessary SEWW related files.
You can decide to manually start Jetty when needed (which includes after rebooting your machine) or you can set up Jetty as a Windows service.
Change directory to $JETTY_HOME$ and double click on $JETTY_HOME$\start.bat to manually start the Jetty server (double clicking on stop.bat will stop the service).
To set up Jetty as a service, go to Windows Service Wrapper and follow the instructions there.
3. Configure SEWW
To configure your Windows systems, please go to Sourceforge download site and from the SEWW folder select seww-svc-0.5.config.zip for download (5Mb) into $WEBSERVER_ROOT$ ("C:\Inetpub\wwwroot" by default). Right click on it and select Extract all into a temp directory under $WEBSERVER_ROOT$ (create directory "C:\Inetpub\wwwroot\temp" by default to extract to). The following two files should be present:
- createparamfile-0.5-SNAPSHOT.jar
- simple_template_complete.xml
Move createparamfile-0.5-SNAPSHOT.jar into $WEBSERVER_ROOT$\..\tpp-bin (by default C:\Inetpub\tpp-bin).
Make sure that you have started the Jetty server. Open Windows Explorer, Mozilla Firefox or Google Chrome and copy the following URL into the address window: http://localhost:8888/seww-svc-0.5-SNAPSHOT/html/bootstrap.html. Fill in a user name (no validation is done in this version) and click on the Start SEWW button. This will take you to the SEWW start page. Click the Load config(s) button and click the Select configuration file... button. In the window tree view, navigate to the temporary directory and select simple_template_complete.xml. Now in the Specify Java bean name: field type simple_workflow_register then click the Upload button.
From the start page click on the Templates and workflows button and you should now see, as a template, Tandem To Protein Prophet Template. Loading this configuration file only needs to be done once.
Note that Java is required for Jetty to run. SEWW has been extensively tested with Java 6.24. To verify that Java is installed, go to Control Panel.Programs and Features and check to see that a Java version is listed. If you need to install Java, go to How do I install Java ?
4. Advanced SEWW Configuration
For advanced users.
SEWW has two environmental variables that can be set to maximize the use of the server's resources.
Since SEWW makes lightweight use of threads, it will automatically set the number of available threads to thirty-two. This can be set lower, if that number is too high and causes the server to become CPU-bound, by using the environmental variable SEWW_NUM_THREADS. This can also be set higher, up to a maximum of five times the number of processors. Click on Control Panel.System.Advanced system settings.Environmental variables... from the start menu. Choose the New... button from the Systems variables and set SEWW_NUM_THREADS as the Variable name and the number of desired threads for the Variable value. While the server is running workflows, the CPU usage can be monitored by right clinking the taskbar, choosing Start Task Manager and then the Performance tab.
The other resource that influences the server's responsiveness is its ability to handle input/output operations (IO). If too many processes attempt to access the disk simultaneously, the server's performance can slow down to a crawl. But if the server can handle more than one heavily IO-bound process in a workflow (the server default), speed-up can be obtained by setting the SEWW_MAX_SIMULTANEOUS_INTENSIVE_THREADS environmental variable to the highest number possible that doesn't cause the server performance to degrade. To set SEWW_MAX_SIMULTANEOUS_INTENSIVE_THREADS follow the same steps as above for SEWW_NUM_THREADS. To observe the IO load on the server, go to the Task Manager's Performance Tab and click on the Resource Monitor... button and look at the Disk statistics.
Note that in order for these settings to take effect, you must either restart Jetty from a new application or open up a new instance of the browser, go to the SEWW home page, and click the Reset server button. The new environmental variables will not be visible to a currently running application.
SEWW Tutorial
1. Download and install the test data and database
For this tutorial, we will be using a SILAC-labeled Yeast dataset, comprised of 2 runs on a high mass-accuracy Orbitrap instrument, along with a Yeast database appended with decoys. We also include a search parameters file. This is the same as for the Petunia tutorial.
- download the mzML files (768Mb) and unzip.
- Then, download the parameters and database files (2.1Mb)
and unzip.
- Copy or move the yeast_orfs_all_REV.20060126.short.fasta file into the folder C:\Inetpub\wwwroot\ISB\data\dbase\speclibs creating the directories if necessary
- Copy or move the two data files (OR20080317_S_SILAC-LH_1-1_01.mzML and OR20080317_S_SILAC-LH_1-1_11.mzML) as well as the tandem parameters file tandem.xml into the folder C:\Inetpub\wwwroot\ISB\data\demo2009\tandem. Create this last folder if necessary.
2. Setup the template
TandemToProteinProphetTemplate in the Templates grid is owned by public and is not editable. You should select it and do a Save as, choosing whatever name you would like and, for this tutorial, as a template. Now Select your new template. Selecting the template brings up the graphical representation of the workflow diagram. Notice that the ASAPRatio module is included since these are SILAC based samples. Clicking on a component will bring up the parameter form for it, a right click will bring up a description.
As best practice, the template should have parameters set that do not vary for the particular MS/MS device that produced the files.
3. Explore the template
By clicking on the different nodes, you can see how the components are linked together to run. In all the components that represent the TPP executables there will be an input file that is linked to the previous component's output file and an output file that will be linked to by the next component's input file. Both these parameters are marked as value required by module since they always must be specified. The input value specifies what value it is linked to in the previous component. The ouput value has a check box to lock value. If this is checked for any parameter, then that parameter's value will not be able to be changed when the workflow is run.
The ASAPRatio component has a few other parameters set to run XInteract as ASAPRatio.
ASAPRatio parameters
- Click on the component and in the parameter form, chose the General Parameters tab
- Click on the arrow at the bottom of the form to bring up the Advanced parameters
- Notice that the field in the upper left is set to not run PeptideProphet
- Click on the ASAP Parameters tab
- Notice that in the upper left, the ASAP Options parameter is specified
- Click on Back to return to the workflow diagram
We will come back to this form later to set more parameters.
Pepxml parameters
- Click on Pepxml
- Notice that The output file has a default
- regex:|:{0}|:|PepXML Input File:|:(.+).tandem:|:$1.pep.xml by beginning with regex, this tells the workflow to use the PepXML Input File name and replace the extension of tandem with pep.xml
- Since this is not marked as locked by the workflow, you can change this if you want.
- Click on Back to return to the workflow diagram
4. Setup the initial input
Template Params File
- This file defines the database search parameters that override the full set of default settings referenced in the file isb_default_input.
- In this example, the mass tolerance is set to -2.1 Da to 4.1 Da in the template parameter file, and the residue modification mass is set to 57.021464@C. A wide mass tolerance is used to include all the spectra with precursor m/z off by one or more isotopic separations; the high accuracy achieved by the instrument is then modeled by PeptideProphet with the accurate mass model.
- For more information, please go to TANDEM
- Click on TemplateParamsFile
- In the form, click on the Browse button
- In the file browser, open the tandem folder and choose tandem.xml then click on the Select file button
- Back in the form, click on 'Save' and then 'Back' to return to the workflow diagram page
- Note that the form states that the value is required to run the workflow
SeqDatabase
- This fasta file contains the Yeast protein sequences
- Click on SeqDatabase
- In the form, click on the Browse button
- In the file browser, open the dbase folder and then the speclibs folder.
- Choose yeast_orfs_all_REV.20060126.short.fasta then click on the Select file button
- Back in the form, click on 'Save' and then 'Back' to return to the workflow diagram page
Runpath
- Click on Runpath
- Note that the form is grayed out and that the value is locked by the workflow.
- Whatever runs are generated initiating from this template, a directory will be created for the run files. The initial files and the files generated by X!Tandem will go in this subdirectory.
- Click on 'Back' to return to the workflow diagram page
MzXMLFile will not be set yet, this will be the final input before the workflow is run.
5. Setting Peptide Prophet parameters
There is one more set of parameters to be set based on the high accuracy of the MS/MS then the template can be saved as a workflow
PeptideProphet parameters
- Click on the PeptideProphet module
- Click on the PeptideProphet Parameters tab
- Find the PeptideProphet options parameter in the upper left
- Check the specify options? check box
- Since this template is being set up for high accuracy MS/MS, also check the lock value check box
- Find the use accurate mass binning in PeptideProphet parameter in the third column
- Check the specify options? check box
- Also check the lock value check box
- Click on Save then Back to return to the workflow diagram page
6. Saving as a workflow
Now save the template as a workflow. The template is now set up for high accuracy MS/MS. For this workflow we will setup the parameters for how the SILAC samples were prepared. If different isotopes are used for other SILAC runs, then the template can be saved as another workflow to capture that difference.
- Choose Save as
- Fill in MyWorkflow for the Name
- Choose the Workflow button
- Click the Save button
ASAPRatio parameters
The sample specific parameters can now be set for ASAP Ratio
- Click on the ASAPRatio module
- Click on the ASAP Parameters tab
change labeled residues parameter
- Find the change labeled residues parameter at the top of the second column
- Check the value needed? check box to let the workflow know this value will need to be set in order to run the workflow
- Select K and R from the dropdown
- leave lock value unchecked to allow this to be changed for a run if desired
range around precusor m/z to search for peak parameter
- Find the range around precusor m/z to search for peak parameter in the middle of the first column
- Check the value needed? check box
- set the value to 0.05
specified label masses parameter
- Find the specified label masses (i.e. M74.325Y125.864), only relevant for static modification quantification parameter at the bottom of the third column
- Check the value needed? check box
- Set the following three amino acid values:
- Pick M from the Select one: dropdown and set a value of 147.035 and click Add choice
- Pick K from the Select one: dropdown and set a value of 136.10916 and click Add choice
- Pick R from the Select one: dropdown and set a value of 166.10941 and click Add choice
- Click the Save button
7. Save as a compiled workflow
Now save the workflow as a compiled workflow. This will lock in the choices made and allow the compiled workflow to be run as many times as desired.
- Click the 'Compile workflow' button
- Name the workflow MyCompiledWorkflow
- Click the Compile button
This will take you to the Compiled Workflows form and allow you to setup and run a workflow.
8. Run a workflow
Setup and run a workflow
- Hilight My Compiled Workflow in the Compiled Workflows page
- Click the Setup run button
- In the Setup to run dialog, name the run MyFirstRun
- Click the Setup button
Because the number of parameters that are required are drastically reduced at this point, the tabs reflect the components of the workflow.
ASAPRatio
- Click on the ASAPRatio tab
- Notice that we see the parameters that were required but not locked. These parameters can be changed
- Click the arrow at the bottom of the form
- Notice that this brings up the form that shows the values that are locked for this module and can't be changed
MzXMLFile
- Now click on the MzXMLFile tab to add the final parameter value to run the workflow
- Click the Browse button
- Open the tandem folder
- Hilight OR20080317_S_SILAC-LH_1-1_01.mzML and OR20080317_S_SILAC-LH_1-1_11.mzML files
- Click the Select files button
- Back on the parameter form click Save
Run the Workflow
- Click the Run button
This brings up the workflow diagram which updates to show a graphical depiction of the workflow running. When it completes it will go to the output files form with a directory tree of where the output files have been saved.
9. Exploring the output
You can run the visualization and analysis tools in the output files tree form.
PeptideProphet results
- Open the peptide_prophet directory
- Right click on interact.pep.shtml file
- This will open a viewer. Sort the list in descending order based on Probabilities. The identifications at the top of the resulting list are most likely to be correct. Click on the hypertext link for any probability. This brings up a details page IMG:PlotModel which shows graphically how successful the modeling was. In the upper pane, it is desirable for the red curve (sensitivity) to hug the upper right corner, and for the green curve (error) to hug the lower left corner. The lower pane shows how well the data (black line) follows the PeptideProphet modeling for each charge state. The blue curve describes the modeling of the negative results, and the purple one, the positive results. If these two curves are well separated and fit the black line well, then the analysis for that charge state was successful.
- Click the Close button for the PlotModel and the Close button for the viewer to go back to SEWW result files
- Right click on the interact.pep.xml file
- This will open the file using the PepXML viewer application.
- Click on the Other Actions top-level tab, and then on the Generate Pep3D button. A new window will launch the Pep3D viewer.
- Leave the default options (or change to taste) and click on the Generate Pep3D Image button.
- After a few moments, you should see an image displayed on the page per mzML input file. [http://tools.proteomecenter.org/wiki/index.php?title=Image:Pep3D.JPG
- Click the Close button for the Pep3D Image and the Close button for the PepXML viewer to go back to SEWW result files
- Close the peptide_prophet directory
InterProphet results
- Open the inter_prophet directory
- Click on the interact.pep.shtml file to view and analyze the results. IMG:iprophet
- Click the Close button for the PepXML viewer
- Close the inter_prophet directory
ASAPRatio results
- Open the asap_ratio directory
- Right click on the asap.pep.shtml file to view and analyze the results. The “asapratio” column contains quantitation results with a link to the ASAPRatio ion trace. The number listed in the “asapratio” column is the light to heavy ratio. IMG:ASAPRatioProfiles
- Click the Close button on the PepXML viewer
- Close the asap_ratio directory
ProteinProphet results
- Open the protein_prophet directory
- Right click on the asap.pep.shtml file to view and analyze the results. Protein groups are sorted in descending order by Probability so that the groups at the top of the page are the most confident identifications. The protein probabilities are the red numbers listed next to each protein group. IMG:Protxml
- Click the Close button on the PepXML viewer
- Close the protein_prophet directory