TPP SEWW Demo2012

From SPCTools

(Difference between revisions)
Jump to: navigation, search
Revision as of 23:25, 12 December 2012
Mmiller (Talk | contribs)
(2. Install Jetty and SEWW)
← Previous diff
Revision as of 23:32, 12 December 2012
Mmiller (Talk | contribs)
(4. Advanced SEWW Configuration)
Next diff →
Line 44: Line 44:
SEWW has two environmental variables that can be set to maximize the use of the server's resources. SEWW has two environmental variables that can be set to maximize the use of the server's resources.
-Since SEWW makes lightweight use of threads, it will automatically set the number of available threads to thirty-two. This can be set lower, if that number is too high and causes the server to become CPU-bound, by using the environmental variable ''SEWW_NUM_THREADS''. This can also be set higher, up to a maximum of five times the number of processors. Click on ''Control Panel.System.Advanced system settings.Environmental variables...'' from the start menu. Choose the ''New...'' button from the ''Systems variables'' and set ''SEWW_NUM_THREADS'' as the ''Variable name'' and the number of desired threads for the ''Variable value''. While the server is running workflows, the CPU usage can be monitored by right clinking the ''taskbar'', choosing ''Start Task Manager'' and then the ''Performance'' tab.+Since SEWW makes lightweight use of threads, it will automatically set the number of available threads to five times the number of CPUs or thirty-two, depending on which is smaller. This can be set lower, if that number is too high and causes the server to become CPU-bound, by using the environmental variable ''SEWW_NUM_THREADS''. This can also be set higher, up to a maximum of five times the number of processors. Click on ''Control Panel.System.Advanced system settings.Environmental variables...'' from the start menu. Choose the ''New...'' button from the ''Systems variables'' and set ''SEWW_NUM_THREADS'' as the ''Variable name'' and the number of desired threads for the ''Variable value''. While the server is running workflows, the CPU usage can be monitored by right clinking the ''taskbar'', choosing ''Start Task Manager'' and then the ''Performance'' tab.
The other resource that influences the server's responsiveness is its ability to handle input/output operations (IO). If too many processes attempt to access the disk simultaneously, the server's performance can slow down to a crawl. But if the server can handle more than one heavily IO-bound process in a workflow (the server default), speed-up can be obtained by setting the ''SEWW_MAX_SIMULTANEOUS_INTENSIVE_THREADS'' environmental variable to the highest number possible that doesn't cause the server performance to degrade. To set ''SEWW_MAX_SIMULTANEOUS_INTENSIVE_THREADS'' follow the same steps as above for ''SEWW_NUM_THREADS''. To observe the IO load on the server, go to the ''Task Manager's'' ''Performance Tab'' and click on the ''Resource Monitor...'' button and look at the ''Disk'' statistics. The other resource that influences the server's responsiveness is its ability to handle input/output operations (IO). If too many processes attempt to access the disk simultaneously, the server's performance can slow down to a crawl. But if the server can handle more than one heavily IO-bound process in a workflow (the server default), speed-up can be obtained by setting the ''SEWW_MAX_SIMULTANEOUS_INTENSIVE_THREADS'' environmental variable to the highest number possible that doesn't cause the server performance to degrade. To set ''SEWW_MAX_SIMULTANEOUS_INTENSIVE_THREADS'' follow the same steps as above for ''SEWW_NUM_THREADS''. To observe the IO load on the server, go to the ''Task Manager's'' ''Performance Tab'' and click on the ''Resource Monitor...'' button and look at the ''Disk'' statistics.
Line 50: Line 50:
Note that in order for these settings to take effect, you must either restart Jetty from a new application or open up a new instance of the browser, go to the SEWW home page, and click the ''Reset server'' button. The new environmental variables will not be visible to a currently running application. Note that in order for these settings to take effect, you must either restart Jetty from a new application or open up a new instance of the browser, go to the SEWW home page, and click the ''Reset server'' button. The new environmental variables will not be visible to a currently running application.
-Another value that can be set is in the ''start'' configuration file at the JETTY_HOME location (''c:\Jetty\jetty-distribution-7.3.0.v20110203-SEWW-0.5.zip'' by default). If you are running 64x Windows and have 4GB of RAM or more, the amount of memory allocated for the server can be increased. Modify the value of the Java -Xmx parameter to 3000m or whatever value allows Jetty and SEWW to run on the server without interfering with other processes.+Another value that can be set is in the ''start'' configuration file at the JETTY_HOME location (''c:\Jetty\jetty-distribution-7.3.0.v20110203-SEWW-0.5'' by default). If you are running 64x Windows and have 4GB of RAM or more, the amount of memory allocated for the server can be increased. Modify the value of the Java -Xmx parameter to 3000m or whatever value allows Jetty and SEWW to run on the server without interfering with other processes.
== SEWW Tutorial == == SEWW Tutorial ==

Revision as of 23:32, 12 December 2012

Contents

Simple Executable Workflow Webapp Setup and Demo

Setting up SEWW

1. Download and install the TPP

To install on your Windows system as a localhost, please follow our TPP Windows Installation Guide, making sure that you select to download the latest version of TPP from our Sourceforge download site (111Mb).

As a way to verify that the installation was successful, log into Petunia by double-clicking on the Trans-Proteomic Pipeline flower icon on your Desktop or through the Start menu. Alternatively, you can open a browser window into the following URL: http://localhost/tpp-bin/tpp_gui.pl . You can use the credentials guest and guest as user name and password to log in.

Make sure that the WEBSERVER_ROOT environment variable is set. From the start menu check Control Panel.System.Advanced system settings.Environment Variables.... If you chose the default location for Petunia, WEBSERVER_ROOT should be c:/Inetpub/wwwroot

2. Install Jetty and SEWW

To install on your Windows systems, please go to Sourceforge download site and from the SEWW folder select jetty-distribution-7.3.0.v20110203-SEWW-0.5.zip for download (36Mb) into the folder you wish to install Jetty (c:\Jetty would be the standard directory). Right click on the zip file and select extract all. This will create the subdirectory jetty-distribution-7.3.0.v20110203-SEWW-0.5. Now create the JETTY_HOME environment variable by going to Control Panel.System.Advanced system settings.Environment Variables... from the start menu. In the Systems variables field click on New ... and set Variable name: to 'JETTY_HOME' and Variable value: to the directory Jetty was installed to (c:\Jetty\jetty-distribution-7.3.0.v20110203-SEWW-0.5 by default).

The Jetty distribution in sashimi has been updated to contain the necessary SEWW related files.

You can decide to manually start Jetty when needed (which includes after rebooting your machine) or you can set up Jetty as a Windows service.

To start manually, open the JETTY_HOME folder (c:\jetty\jetty-distribution-7.3.0.v20110203-SEWW-0.5 by default) and double click on the file named start of type Windows Batch file. Double clicking on the Windows Batch file stop will stop the service. On Vista or Windows 7 you may need to right click on start, stop, and the start ini file and click on Unblock.

To set up Jetty as a service, go to Windows Service Wrapper and follow the instructions there. Make sure to follow the instructions at Tanuki Software to download the executable Jetty-Service.exe

Note that Java is required for Jetty to run and the JAVA_HOME environment variable must be set. SEWW has been extensively tested with Java 6.27. To verify that Java is installed, on Windows 7, go to Control Panel.Programs and Features and look for Java(TM) Update .... The version will be the listed at the end of the column. If there are multiple versions, use the latest. To find the installed location look for the Location column. If you don't see the Location column, click on the column headers, click More and select Location. Write the location down and go to Control Panel.System.Advanced system settings.Environment Variables... from the start menu. If JAVA HOME is not listed, create it and make sure the value is the location from Control Panel.Programs and Features. If you need to install Java, go to Free Java Download, follow the instructions there, then set the JAVA_HOME environment variable.

3. Configure SEWW

To configure your Windows systems, please go to Sourceforge download site and from the SEWW folder select seww-svc-0.5.config.zip for download (5Mb) into $WEBSERVER_ROOT$ ("C:\Inetpub\wwwroot" by default). Right click on it and select Extract all into a temp directory under $WEBSERVER_ROOT$ (create directory "C:\Inetpub\wwwroot\temp" by default to extract to). The following two files should be present:

  • createparamfile-0.5-SNAPSHOT.jar
  • simple_template_complete.xml

Move createparamfile-0.5-SNAPSHOT.jar into $WEBSERVER_ROOT$\..\tpp-bin (by default C:\Inetpub\tpp-bin).

Make sure that you have started the Jetty server. Open Internet Explorer, Mozilla Firefox or Google Chrome and copy the following URL into the address window: http://localhost:8888/seww-svc-0.5-SNAPSHOT/html/bootstrap.html. Fill in a user name (no validation is done in this version) and click on the Start SEWW button. This will take you to the SEWW start page. Click the Load config(s) button and click the Select configuration file... button. In the window tree view, navigate to the temporary directory and select simple_template_complete.xml. Now in the Specify Java bean name: field type register then click the Upload button.

From the start page click on the Templates and workflows button and you should now see the template Tandem To Protein Prophet Template. Loading this simple_template_complete.xml configuration file only needs to be done once.

4. Advanced SEWW Configuration

For advanced users.

SEWW has two environmental variables that can be set to maximize the use of the server's resources.

Since SEWW makes lightweight use of threads, it will automatically set the number of available threads to five times the number of CPUs or thirty-two, depending on which is smaller. This can be set lower, if that number is too high and causes the server to become CPU-bound, by using the environmental variable SEWW_NUM_THREADS. This can also be set higher, up to a maximum of five times the number of processors. Click on Control Panel.System.Advanced system settings.Environmental variables... from the start menu. Choose the New... button from the Systems variables and set SEWW_NUM_THREADS as the Variable name and the number of desired threads for the Variable value. While the server is running workflows, the CPU usage can be monitored by right clinking the taskbar, choosing Start Task Manager and then the Performance tab.

The other resource that influences the server's responsiveness is its ability to handle input/output operations (IO). If too many processes attempt to access the disk simultaneously, the server's performance can slow down to a crawl. But if the server can handle more than one heavily IO-bound process in a workflow (the server default), speed-up can be obtained by setting the SEWW_MAX_SIMULTANEOUS_INTENSIVE_THREADS environmental variable to the highest number possible that doesn't cause the server performance to degrade. To set SEWW_MAX_SIMULTANEOUS_INTENSIVE_THREADS follow the same steps as above for SEWW_NUM_THREADS. To observe the IO load on the server, go to the Task Manager's Performance Tab and click on the Resource Monitor... button and look at the Disk statistics.

Note that in order for these settings to take effect, you must either restart Jetty from a new application or open up a new instance of the browser, go to the SEWW home page, and click the Reset server button. The new environmental variables will not be visible to a currently running application.

Another value that can be set is in the start configuration file at the JETTY_HOME location (c:\Jetty\jetty-distribution-7.3.0.v20110203-SEWW-0.5 by default). If you are running 64x Windows and have 4GB of RAM or more, the amount of memory allocated for the server can be increased. Modify the value of the Java -Xmx parameter to 3000m or whatever value allows Jetty and SEWW to run on the server without interfering with other processes.

SEWW Tutorial

1. Download and install the test data and database

For this tutorial, we will be using a SILAC-labeled Yeast dataset, comprised of 2 runs on a high mass-accuracy Orbitrap instrument, along with a Yeast database appended with decoys. We also include a search parameters file. This is the same as for the Petunia tutorial.

and unzip.

  • Copy or move the yeast_orfs_all_REV.20060126.short.fasta file into the folder C:\Inetpub\wwwroot\ISB\data\dbase\speclibs creating the folders if necessary
  • Copy or move the two data files (OR20080317_S_SILAC-LH_1-1_01.mzML and OR20080317_S_SILAC-LH_1-1_11.mzML) as well as the tandem parameters file tandem.xml into the folder C:\Inetpub\wwwroot\ISB\data\demo2009\tandem. Create this last folder if necessary.

2. Setup the template

TandemToProteinProphetTemplate in the Templates grid is owned by public and is not editable. You should select it and do a Save as, choosing whatever name you would like and, for this tutorial, as a template. Selecting the template brings up the graphical representation of the workflow diagram. Notice that the name of the template on the title bar has changed to the name of your new template. Also notice that the ASAPRatio module is included since these are SILAC based samples. Clicking on a component will bring up the parameter form for it, a right click will bring up a description.

As best practice, the template should have parameters set that do not vary for the particular MS/MS device that produced the files.

3. Explore the template

By clicking on the different nodes, you can see how the components are linked together to run. In all the components that represent the TPP executables there will be an input file that is linked to the previous component's output file and an output file that will be linked to by the next component's input file. Both these parameters are marked as value required by module since they always must be specified. The input value specifies what value it is linked to in the previous component. The ouput value has a check box to lock value. If this is checked for any parameter, then that parameter's value will not be able to be changed when the workflow is setup to run.

The ASAPRatio component has a few other parameters set to run XInteract as ASAPRatio.

ASAPRatio parameters

  • Click on the component and in the parameter form, chose the General Parameters tab
  • Click on the arrow at the bottom of the form to bring up the Advanced parameters
  • Notice that the field in the upper left is set to not run PeptideProphet
  • Click on the ASAP Parameters tab
  • Notice that in the upper left, the ASAP Options parameter is specified
  • Click on Back to return to the workflow diagram

We will come back to this form later to set more parameters.

Pepxml parameters

  • Click on NestedTandem
  • This will bring up the graphical representation of the nested workflow for running multiple instances of X!Tandem
  • Click on Pepxml
  • Notice that The output file has a default
  • regex:|:{0}|:|PepXML Input File:|:(.+).tandem:|:$1.pep.xml by beginning with regex, this tells the workflow to use the PepXML Input File name and replace the extension of tandem with pep.xml
  • Since this is not marked as locked by the workflow, you can change this if you want.
  • Click on Back to return to the nested workflow diagram
  • Click on Parent wokflow to return to the top-level workflow

4. Setup the initial input

Template Params File

This file defines the database search parameters that override the full set of default settings referenced in the file isb_default_input.
In this example, the mass tolerance is set to -2.1 Da to 4.1 Da in the template parameter file, and the residue modification mass is set to 57.021464@C. A wide mass tolerance is used to include all the spectra with precursor m/z off by one or more isotopic separations; the high accuracy achieved by the instrument is then modeled by PeptideProphet with the accurate mass model.
For more information, please go to TANDEM
  • Click on TemplateParamsFile
  • In the form, click on the Browse button
  • In the file browser, open the demo2009 folder and then the tandem folder and choose tandem.xml then click on the Select file button
Note that the form states that the value is required to run the workflow
  • Back in the form, click 'Save' and then 'Back' to return to the workflow diagram page

SeqDatabase

This fasta file contains the Yeast protein sequences
  • Click on SeqDatabase
  • In the form, click on the Browse button
  • In the file browser, open the dbase folder and then the speclibs folder.
  • Choose yeast_orfs_all_REV.20060126.short.fasta then click on the Select file button
  • Click 'Save' and then 'Back' to return to the workflow diagram page

Runpath

  • Click on Runpath
  • Note that the form is grayed out and that the value is locked by the workflow.
  • Whatever runs are generated initiating from this template, a directory will be created for the run files. The initial files and the files generated by X!Tandem will go in this subdirectory.
  • Click on 'Back' to return to the workflow diagram page

MzXMLFile will not be set yet, this will be the final input before the workflow is run.

5. Setting Peptide Prophet parameters

There is one more set of parameters to be set based on the high accuracy of the MS/MS then the template can be saved as a workflow

PeptideProphet parameters

  • Click on the PeptideProphet module
  • Click on the PeptideProphet Parameters tab
  • Find the PeptideProphet options parameter in the upper left
  • Check the specify options? check box
  • Since this template is being set up for high accuracy MS/MS, also check the lock value check box
  • Find the use accurate mass binning in PeptideProphet parameter in the third column
  • Check the specify options? check box
  • Also check the lock value check box
  • Click on Save then Back to return to the workflow diagram page

6. Saving as a workflow

Now save the template as a workflow. The template is now set up for high accuracy MS/MS. For this workflow we will setup the parameters for how the SILAC samples were prepared. If different isotopes are used for other SILAC runs, then the template can be saved as another workflow to capture that difference.

  • Choose Save as
  • Fill in MyWorkflow for the Name
  • Choose the Workflow button
  • Click the Save button

ASAPRatio parameters

The sample specific parameters can now be set for ASAP Ratio

  • Click on the ASAPRatio module
  • Click on the ASAP Parameters tab

change labeled residues parameter

  • Find the change labeled residues parameter at the top of the second column
  • Check the value needed? check box to let the workflow know this value will need to be set in order to run the workflow
  • Select K and R from the dropdown
  • leave lock value unchecked to allow this to be changed for a run if desired

range around precusor m/z to search for peak parameter

  • Find the range around precusor m/z to search for peak parameter in the middle of the first column
  • Check the value needed? check box
  • set the value to 0.05

specified label masses parameter

  • Find the specified label masses (i.e. M74.325Y125.864), only relevant for static modification quantification parameter at the bottom of the third column
  • Check the value needed? check box
  • Set the following three amino acid values:
    • Pick M from the Select one: dropdown and set a value of 147.035 and click Add choice
    • Pick K from the Select one: dropdown and set a value of 136.10916 and click Add choice
    • Pick R from the Select one: dropdown and set a value of 166.10941 and click Add choice
  • Click the Save button

7. Save as a compiled workflow

Now save the workflow as a compiled workflow. This will lock in the choices made and allow the compiled workflow to be run as many times as desired.

  • Click the 'Compile workflow' button
  • Name the workflow MyCompiledWorkflow
  • Click the Compile button

This will take you to the Compiled Workflows form and allow you to setup and run a workflow.

8. Run a workflow

Setup and run a workflow

  • Hilight My Compiled Workflow in the Compiled Workflows page
  • Click the Setup run button
  • In the Setup to run dialog, name the run MyFirstRun
  • Click the Setup button

Because the number of parameters that are required are drastically reduced at this point, the tabs reflect the components of the workflow.

ASAPRatio

  • Click on the ASAPRatio tab
  • Notice that we see the parameters that were required but not locked. These parameters can be changed
  • Click the arrow at the bottom of the form
  • Notice that this brings up the form that shows the values that are locked for this module and can't be changed

MzXMLFile

  • Now click on the MzXMLFile tab to add the final parameter value to run the workflow
  • Click the Browse button
  • Open the demo2009 folder, then the tandem folder
  • Select OR20080317_S_SILAC-LH_1-1_01.mzML and OR20080317_S_SILAC-LH_1-1_11.mzML files (hold the Ctrl key for multiple select)
  • Click the Select files button
  • Back on the parameter form click Save

Run the Workflow

  • Click the Run button

This brings up the workflow diagram which updates to show a graphical depiction of the workflow running. When it completes it will go to the output files form with a directory tree of where the output files have been saved.

9. Exploring the output

You can run the visualization and analysis tools in the output files tree form.

  • Note: when closing a viewer, pick the appropriate method to close the viewer depending on whether the viewer came up in a new tab in the same browser or whether it came up in a new browser instance.

PeptideProphet results

  • Open the peptide_prophet directory
  • Right click on interact.pep.shtml file and select View. This will open a viewer.
  • Using the sorter at the top left, sort the list in descending order based on Probabilities. The identifications at the top of the resulting list are most likely to be correct. Click on the hypertext link for any probability. This brings up a details page IMG:PlotModel which shows graphically how successful the modeling was. In the upper pane, it is desirable for the red curve (sensitivity) to hug the upper right corner, and for the green curve (error) to hug the lower left corner. The lower pane shows how well the data (black line) follows the PeptideProphet modeling for each charge state. The blue curve describes the modeling of the negative results, and the purple one, the positive results. If these two curves are well separated and fit the black line well, then the analysis for that charge state was successful.
  • Click the Close button for the PlotModel and the Close button for the viewer to go back to SEWW result files
  • Right click on the interact.pep.xml file and select PepXML. This will open the file using the PepXML viewer application.
  • Click on the Other Actions top-level tab, and then on the Generate Pep3D button. A new window will launch the Pep3D viewer.
  • Leave the default options (or change to taste) and click on the Generate Pep3D Image button.
  • After a few moments, you should see an image displayed on the page per mzML input file. [http://tools.proteomecenter.org/wiki/index.php?title=Image:Pep3D.JPG
  • Click the Close button for the Pep3D Image and the Close button for the PepXML viewer to go back to SEWW result files
  • Close the peptide_prophet directory

InterProphet results

  • Open the inter_prophet directory
  • Right click on the interact.iproph.pep.shtml file to view and analyze the results. Notice that the IMG:iprophet
  • Click the Close button for the PepXML viewer
  • Close the inter_prophet directory

ASAPRatio results

  • Open the asap_ratio directory
  • Right click on the asap.pep.shtml file to view and analyze the results. The “ASAPRATIO” column that has been added contains quantitation results with a link to the ASAPRatio ion trace. The number listed in the “asapratio” column is the light to heavy ratio. IMG:ASAPRatioProfiles
  • Click the Close button on the PepXML viewer
  • Close the asap_ratio directory

ProteinProphet results

  • Open the protein_prophet directory
  • Right click on the interact.prot.shtml file to view and analyze the results. Protein groups are sorted in descending order by Probability so that the groups at the top of the page are the most confident identifications. The protein probabilities are the red numbers listed next to each protein group. IMG:Protxml
  • Click the Close button on the PepXML viewer
  • Close the protein_prophet directory
Personal tools