Formats:mzXML
From SPCTools
Revision as of 19:22, 23 January 2008 Jtasman (Talk | contribs) ← Previous diff |
Current revision Jeng (Talk | contribs) (→MzXML2Search) |
||
Line 1: | Line 1: | ||
- | '''mzXML''' is a open data format for storage and exchange of mass spectroscopy data, developed at the SPC/Institute for Systems Biology. mzXML provides a standard container for ms and ms/ms proteomics data and is the foundation of our proteomic pipelines. Raw, proprietary file formats from most vendors can be converted to the open mzXML format. | + | '''mzXML''' is an open data format for storage and exchange of mass spectroscopy data, developed at the SPC/Institute for Systems Biology. mzXML provides a standard container for ms and ms/ms proteomics data and is the foundation of our proteomic pipelines. Raw, proprietary file formats from most vendors can be converted to the open mzXML format. |
Patrick Pedrioli was the primary original author; please see the references below for the often-cited Nature Biotech publication. | Patrick Pedrioli was the primary original author; please see the references below for the often-cited Nature Biotech publication. | ||
- | Several versions of this format exist. Currently these are 1.0 (also called "msXML"), 2.0, 2.1 and 3.0. '''2.1''' is the most common version currently in use. | + | Several versions of this format exist. Currently these are 1.0 (also called "msXML"), 2.0, 2.1, 3.0, and 3.1. '''3.1''' is the current version. The offical XML Schema components for the format are available at [http://sashimi.sourceforge.net/schema_revision/ mzXML schema]. |
+ | ==Converters: status and summary table== | ||
+ | All known formats and their converters are summarized in the table below: | ||
+ | '''Compatible Mass-Spectrometer Instrument-Specific Software and File Formats:''' | ||
+ | {| class="wikitable" style="text-align:center;" border="2" cellpadding="2" | ||
+ | |+ Compatible Mass-Spec instrument-specific file formats | ||
+ | |- | ||
+ | ! raw file vendor !! instrument acquisition software !! raw file type !! raw-to-mzXML converter !! SPC maintained? !! converter status/notes !! download | ||
+ | |- | ||
+ | | Bruker || (?) || .baf || [[Software:CompassXport|CompassXport]] page || no (Bruker supported) || || refer to vendor; see [[Software:CompassXport|CompassXport]] page | ||
+ | |- | ||
+ | | [http://www.thermo.com/ Thermo] || [http://www.thermo.com/com/cda/product/detail/1,,1000001009250,00.html XCalibur ]|| .RAW file || [[Software:ReAdW|ReAdW]] || yes, SPC || || official release: 4.3.0, August 2009; see [[Software:ReAdW|ReAdW]] page | ||
+ | |- | ||
+ | | [http://www.waters.com/watersdivision/ContentD.asp?watersit=JDRS-5KEKAJ Waters] || [http://www.waters.com/watersdivision/ContentD.asp?watersit=JDRS-5WQP7W MassLynx] || .RAW directory || [[Software:massWolf|wolf]] || yes, SPC || no centroiding || offical release: 4.3.0, August 2009; see [[Software:massWolf|wolf]] page | ||
+ | |- | ||
+ | | [http://www.mdssciex.com/ MDS/Sciex] for [https://products.appliedbiosystems.com/ab/en/US/adirect/ab;jsessionid=Gn0ZFF66S2ZTv1cfLTLn2nGxCH5L1gYVGv39d92hdr4PqSBglGVG!1672817600?cmd=catNavigate2&catID=600832 ABI] and [http://www.chem.agilent.com/Scripts/PCol.asp?lPage=38192 Agilent] || [https://products.appliedbiosystems.com/ab/en/US/adirect/ab?cmd=catNavigate2&catID=600927 Analyst], [https://products.appliedbiosystems.com/ab/en/US/adirect/ab?cmd=catNavigate2&catID=600910 AnalystQS] || .wiff file|| [[Software:mzWiff|mzWiff]]|| yes, SPC || || official release: 4.3.0, August 2009: see [[Software:mzWiff|mzWiff]] page | ||
+ | |- | ||
+ | | [http://www.chem.agilent.com/Scripts/PCol.asp?lPage=38192 Agilent] || [http://www.chem.agilent.com/scripts/pds.asp?lpage=40460 MassHunter] || .d directory || trapper || yes, SPC || || official release: 4.3.0, August 2009: see [[Software:trapper|trapper]] page | ||
+ | |- | ||
+ | | ABI || (?) || (?) || T2DExtractor || no (Andrews Lab)|| 4700/4800 MALDI TOFTOF converter; known issue: correct instrument name is not recorded in mzXML; charge state info is not recorded in mzXML || [http://tools.proteomecenter.org/T2DE.php link] | ||
+ | |- | ||
+ | | ABI || (?) || (?) || PyMsXML || no || ABI Mariner, Voyager converter || [http://www.umiacs.umd.edu/~nedwards/research/PyMsXML.html external link] | ||
+ | |} | ||
+ | |||
+ | '''Other Compatible MS File Formats:''' | ||
+ | |||
+ | {| class="wikitable" style="text-align:center;" border="2" cellpadding="2" | ||
+ | |+ Other compatible MS file formats | ||
+ | |- | ||
+ | ! format !! original use !! text-to-mzXML converter !! SPC maintained? !! converter status/notes !! download | ||
+ | |- | ||
+ | | .dta || sequest || dta2mzXML || yes || in TPP release || refer to TPP page | ||
+ | |- | ||
+ | | .pkl || ? || pkl2mzXML || yes || in TPP release || [[Software:pkl2mzXML|pkl2mzXML]] | ||
+ | |- | ||
+ | | .mgf|| mascot || ? || ? || ? || ? | ||
+ | |- | ||
+ | |} | ||
==Converter Overview== | ==Converter Overview== | ||
Line 26: | Line 63: | ||
====Analyst (ABI/MDS Sciex)==== | ====Analyst (ABI/MDS Sciex)==== | ||
- | *[[Software:mzStar|mzStar]]: Analyst (ABI/MDS Sciex) software's raw data (.wiff files) to mzXML converter; please note, mzStar is about to be retired in favor of the much-improved [[Software:mzWiff|mzWiff]] converter. | + | *[[Software:mzWiff|mzWiff]]: Analyst (ABI/MDS Sciex) software's raw data (.wiff files) to mzXML converter. |
====Waters Masslynx==== | ====Waters Masslynx==== | ||
*[[Software:massWolf|massWolf]]: MassLynx (Waters) raw data (.raw directories) to mzXML converter | *[[Software:massWolf|massWolf]]: MassLynx (Waters) raw data (.raw directories) to mzXML converter | ||
+ | ====Agilent MassHunter==== | ||
+ | *[[Software:trapper|trapper]]: MassHunter (Agilent) raw data (.d directories) to mzXML converter | ||
+ | |||
+ | ===Other SPC/TPP supported MS file formats=== | ||
+ | |||
+ | ====dta==== | ||
+ | *[[Software:dta2mzXML|dta2mzXML]] | ||
+ | The dta format is a simple text file describing precursor m/z and a peaklist table. This format was originally developed for Sequest. No annotation metadata is stored in this format, so you will loose information going from raw to dta then back to mzXML. | ||
+ | |||
+ | |||
+ | ====pkl==== | ||
+ | *[[Software:pkl2mzXML|pkl2mzXML]] | ||
+ | Command-line program to convert pkl files to mzXML files. | ||
+ | |||
+ | ====mgf==== | ||
+ | The mgf format is another simple text format developed for the Mascot search engine. | ||
===Other external converter projects=== | ===Other external converter projects=== | ||
These projects are not supported by the SPC, but may be of use. | These projects are not supported by the SPC, but may be of use. | ||
+ | |||
+ | *'''msConvert'''. Part of the ProteoWizard Library. Available from [http://proteowizard.sourceforge.net http://proteowizard.sourceforge.net]. Supports RAW, mzML and mzXML as of 6/08. | ||
====SCIEX/ABI 4700/4800 MALDI TOFTOF==== | ====SCIEX/ABI 4700/4800 MALDI TOFTOF==== | ||
Line 41: | Line 96: | ||
*'''PyMsXML''' - ABI Mariner, Voyager raw data to mzXML: python-based converter. (Also supports Analyst .wiff files, but we recommend our own [[Software:mzWiff|mzWiff]] program for that purpose.) Please see this link: [http://www.umiacs.umd.edu/~nedwards/research/PyMsXML.html http://www.umiacs.umd.edu/~nedwards/research/PyMsXML.html] | *'''PyMsXML''' - ABI Mariner, Voyager raw data to mzXML: python-based converter. (Also supports Analyst .wiff files, but we recommend our own [[Software:mzWiff|mzWiff]] program for that purpose.) Please see this link: [http://www.umiacs.umd.edu/~nedwards/research/PyMsXML.html http://www.umiacs.umd.edu/~nedwards/research/PyMsXML.html] | ||
+ | ==Converting mzXML '''to''' other formats== | ||
+ | ===MzXML2Search=== | ||
+ | This program can convert to dta, mgf, pkl, and ms2 formats. | ||
- | + | ===msconvert=== | |
- | ==Converters: status and summary table== | + | This program can convert to many output formats, including mzML. Eventually, this program will be a general purpose converter, converting from all common raw formats into both mzML and mzXML. Here is a [[Msconvert Capabilities | summary]] of what it can do and can't do in its installation at the Institute for Systems Biology. |
- | All known formats and their converters are summarized in the table below: | + | |
- | + | ||
- | '''Compatible Mass-Spectrometer Instrument-Specific Software and File Formats:''' | + | |
- | + | ||
- | {| class="wikitable" style="text-align:center;" border="2" cellpadding="2" | + | |
- | |+ Compatible Mass-Spec file formats | + | |
- | |- | + | |
- | ! raw file vendor !! instrument acquisition software !! raw file type !! raw-to-mzXML converter !! SPC maintained? !! converter status/notes !! download | + | |
- | |- | + | |
- | | Bruker || (?) || .baf || [[Software:CompassXport|CompassXport]] page || no (Bruker supported) || || refer to vendor; see [[Software:CompassXport|CompassXport]] page | + | |
- | |- | + | |
- | | [http://www.thermo.com/ Thermo] || [http://www.thermo.com/com/cda/product/detail/1,,1000001009250,00.html XCalibur ]|| .RAW file || [[Software:ReAdW|ReAdW]] || yes, SPC || known issues: does not centroid orbi/ft data correctly || official release: Nov 2006; see [[Software:ReAdW|ReAdW]] page; contact spctools-discuss newsgroup if interested in beta ReAdW version | + | |
- | |- | + | |
- | | [http://www.waters.com/watersdivision/ContentD.asp?watersit=JDRS-5KEKAJ Waters] || [http://www.waters.com/watersdivision/ContentD.asp?watersit=JDRS-5WQP7W MassLynx] || .RAW directory || [[Software:massWolf|wolf]] || yes, SPC || no centroiding || offical release: ???; see [[Software:massWolf|wolf]] page; contact spctools-discuss newsgroup if interested in beta wolf version | + | |
- | |- | + | |
- | | [http://www.mdssciex.com/ MDS/Sciex] for [https://products.appliedbiosystems.com/ab/en/US/adirect/ab;jsessionid=Gn0ZFF66S2ZTv1cfLTLn2nGxCH5L1gYVGv39d92hdr4PqSBglGVG!1672817600?cmd=catNavigate2&catID=600832 ABI] and [http://www.chem.agilent.com/Scripts/PCol.asp?lPage=38192 Agilent] || [https://products.appliedbiosystems.com/ab/en/US/adirect/ab?cmd=catNavigate2&catID=600927 Analyst], [https://products.appliedbiosystems.com/ab/en/US/adirect/ab?cmd=catNavigate2&catID=600910 AnalystQS] || .wiff file|| [[Software:mzStar|mzStar]] (official release); [[Software:mzWiff|mzWiff]] (beta, not yet released) || yes, SPC || mzStar is known to have many bugs which are fixed in mzWiff || [[Software:mzStar|mzStar]] page; contact spctools-discuss newsgroup if interested in beta [[Software:mzWiff|mzWiff]] | + | |
- | |- | + | |
- | | [http://www.chem.agilent.com/Scripts/PCol.asp?lPage=38192 Agilent] || [http://www.chem.agilent.com/scripts/pds.asp?lpage=40460 MassHunter] || .d directory || trapper (beta, not yet released) || yes, SPC || || currently internal only for development purposes | + | |
- | |- | + | |
- | | ABI || (?) || (?) || T2DExtractor || no (Andrews Lab)|| 4700/4800 MALDI TOFTOF converter; known issue: correct instrument name is not recorded in mzXML; charge state info is not recorded in mzXML || [http://tools.proteomecenter.org/T2DE.php link] | + | |
- | |- | + | |
- | | ABI || (?) || (?) || PyMsXML || no || ABI Mariner, Voyager converter || [http://www.umiacs.umd.edu/~nedwards/research/PyMsXML.html external link] | + | |
- | |} | + | |
==Viewing mzXML files== | ==Viewing mzXML files== | ||
Line 81: | Line 116: | ||
==Parsing mzXML data== | ==Parsing mzXML data== | ||
- | RAMP: C/C++ API, contained in TPP and basis for TPP tools. Also parses mzData: link | + | ===RAMP=== |
+ | [[Software:RAMP|RAMP]]: C/C++ API, contained in TPP and basis for TPP tools. Also parses mzData. | ||
- | JRAP: java | + | [[Software:JRAP|JRAP]]: java API |
==Related Formats== | ==Related Formats== | ||
- | Please note that while widely accepted in the proteomics community mzXML is format, not a standard. There are two other open ms/ms proteomic formats: mzData and mzML. mzData was the first attempt, created by the HUPO/PSI committee process. Many vendors wanted to wait until the format was finalized (a process which took two years); in the meanwhile, mzXML was developed to fill the need. mzML is an upcoming format which aims to marry the best elements of mzXML and mzData, and represents a joint effort of the HUPO/PSI committee, SPC/ISB, instrument vendors, and other proteomics software groups. | + | Please note that while widely accepted in the proteomics community mzXML is format, not a standard. There are two other open ms/ms proteomic formats: mzData and mzML. mzData was the first attempt, created by the HUPO/PSI committee process. Many vendors wanted to wait until the format was finalized (a process which took two years); in the meantime, mzXML was developed to fill the need. |
+ | |||
+ | mzML is a new format released 6/2008 which aims to marry the best elements of mzXML and mzData, and represents a joint effort of the HUPO/PSI committee, SPC/ISB, instrument vendors, and other proteomics software groups. It is intended and expected that mzML will replace the use of mzXML and mzData. For more information, visit the [http://psidev.info/index.php?q=node/257 mzML development page]. As of version 4.0, the TPP supports mzML through an interface between RAMP and [http://proteowizard.sourceforge.net Proteowizard]. | ||
==Reference== | ==Reference== | ||
- | *Pedrioli PGA, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti R, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK Jr, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R. (2004) "A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment." Nature Biotechnology 22(11):1459-1466. [http://www.proteomecenter.org/PDFs/Desiere.GenomeBiology.04.pdf Download PDF] | + | * mzXML 2.0 format guide by Patrick Pedrioli: outdated but a very useful reference document. http://sashimi.sourceforge.net/schema_revision/mzXML_2.0/Doc/ |
+ | *Pedrioli PGA, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti R, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK Jr, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R. (2004) "A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment." Nature Biotechnology 22(11):1459-1466. [http://www.proteomecenter.org/PDFs/Pedrioli.NatureBiotech.04.pdf Download PDF] | ||
*Learn more at the [http://sashimi.sourceforge.net/ Sashimi project website], the original website for the mzXML format. | *Learn more at the [http://sashimi.sourceforge.net/ Sashimi project website], the original website for the mzXML format. |
Current revision
mzXML is an open data format for storage and exchange of mass spectroscopy data, developed at the SPC/Institute for Systems Biology. mzXML provides a standard container for ms and ms/ms proteomics data and is the foundation of our proteomic pipelines. Raw, proprietary file formats from most vendors can be converted to the open mzXML format.
Patrick Pedrioli was the primary original author; please see the references below for the often-cited Nature Biotech publication.
Several versions of this format exist. Currently these are 1.0 (also called "msXML"), 2.0, 2.1, 3.0, and 3.1. 3.1 is the current version. The offical XML Schema components for the format are available at mzXML schema.
Converters: status and summary table
All known formats and their converters are summarized in the table below:
Compatible Mass-Spectrometer Instrument-Specific Software and File Formats:
raw file vendor | instrument acquisition software | raw file type | raw-to-mzXML converter | SPC maintained? | converter status/notes | download |
---|---|---|---|---|---|---|
Bruker | (?) | .baf | CompassXport page | no (Bruker supported) | refer to vendor; see CompassXport page | |
Thermo | XCalibur | .RAW file | ReAdW | yes, SPC | official release: 4.3.0, August 2009; see ReAdW page | |
Waters | MassLynx | .RAW directory | wolf | yes, SPC | no centroiding | offical release: 4.3.0, August 2009; see wolf page |
MDS/Sciex for ABI and Agilent | Analyst, AnalystQS | .wiff file | mzWiff | yes, SPC | official release: 4.3.0, August 2009: see mzWiff page | |
Agilent | MassHunter | .d directory | trapper | yes, SPC | official release: 4.3.0, August 2009: see trapper page | |
ABI | (?) | (?) | T2DExtractor | no (Andrews Lab) | 4700/4800 MALDI TOFTOF converter; known issue: correct instrument name is not recorded in mzXML; charge state info is not recorded in mzXML | link |
ABI | (?) | (?) | PyMsXML | no | ABI Mariner, Voyager converter | external link |
Other Compatible MS File Formats:
format | original use | text-to-mzXML converter | SPC maintained? | converter status/notes | download |
---|---|---|---|---|---|
.dta | sequest | dta2mzXML | yes | in TPP release | refer to TPP page |
.pkl | ? | pkl2mzXML | yes | in TPP release | pkl2mzXML |
.mgf | mascot | ? | ? | ? | ? |
Converter Overview
The first step in processing ms or ms/ms data with SPC proteomics software is conversion of raw data to our open mzXML format.
direct vendor support
Bruker
Bruker directly supports the mzXML format, using their CompassXport program:
- CompassXport is the recommended converter for Bruker (.baf) files. (And for historical reasons only, we provide a page on our own (SPC-created) retired open-source Bruker converter, mzBruker.) Also, see Additional Resources, below.
SPC/TPP supported proprietary formats
The TPP project supports a wide variety of mass-spec instrument formats. Currently, we provide software tools (converters) for the following vendor formats. Follow the links for more specific information for each converter:
Thermo/XCalibur
- ReAdW: Thermo/XCalibur raw data (.RAW files) to mzXML converter
Analyst (ABI/MDS Sciex)
- mzWiff: Analyst (ABI/MDS Sciex) software's raw data (.wiff files) to mzXML converter.
Waters Masslynx
- massWolf: MassLynx (Waters) raw data (.raw directories) to mzXML converter
Agilent MassHunter
- trapper: MassHunter (Agilent) raw data (.d directories) to mzXML converter
Other SPC/TPP supported MS file formats
dta
The dta format is a simple text file describing precursor m/z and a peaklist table. This format was originally developed for Sequest. No annotation metadata is stored in this format, so you will loose information going from raw to dta then back to mzXML.
pkl
Command-line program to convert pkl files to mzXML files.
mgf
The mgf format is another simple text format developed for the Mascot search engine.
Other external converter projects
These projects are not supported by the SPC, but may be of use.
- msConvert. Part of the ProteoWizard Library. Available from http://proteowizard.sourceforge.net. Supports RAW, mzML and mzXML as of 6/08.
SCIEX/ABI 4700/4800 MALDI TOFTOF
- T2Extractor - SCIEX/ABI 4700/4800 MALDI TOFTOF Data to mzXML: Java application to convert data from a SCIEX/ABI 4000 series MALDI TOFTOF instruments into mzXML format. This application is provided courtesy of Phil Andrews lab at the University of Michigan. Please see this link: [1]
ABI Mariner, Voyager
- PyMsXML - ABI Mariner, Voyager raw data to mzXML: python-based converter. (Also supports Analyst .wiff files, but we recommend our own mzWiff program for that purpose.) Please see this link: http://www.umiacs.umd.edu/~nedwards/research/PyMsXML.html
Converting mzXML to other formats
MzXML2Search
This program can convert to dta, mgf, pkl, and ms2 formats.
msconvert
This program can convert to many output formats, including mzML. Eventually, this program will be a general purpose converter, converting from all common raw formats into both mzML and mzXML. Here is a summary of what it can do and can't do in its installation at the Institute for Systems Biology.
Viewing mzXML files
Pep3D
The Pep3D program, included with the TPP, produces a gel-like image, which is very useful for getting an overview of an entire run. It can also link to CID and peptide probability information.
Insilicos viewer
We also recommend the free InSilicos Viewer, which can display mzXML, mzData, and other formats.
Additional Resources
- out2linux.pl: working with Bioworks Sequest output. See Software:Out2XML page.
- working with Bruker ion trap and DataAnalysis software. See Software:da2tpp page.
Parsing mzXML data
RAMP
RAMP: C/C++ API, contained in TPP and basis for TPP tools. Also parses mzData.
JRAP: java API
Related Formats
Please note that while widely accepted in the proteomics community mzXML is format, not a standard. There are two other open ms/ms proteomic formats: mzData and mzML. mzData was the first attempt, created by the HUPO/PSI committee process. Many vendors wanted to wait until the format was finalized (a process which took two years); in the meantime, mzXML was developed to fill the need.
mzML is a new format released 6/2008 which aims to marry the best elements of mzXML and mzData, and represents a joint effort of the HUPO/PSI committee, SPC/ISB, instrument vendors, and other proteomics software groups. It is intended and expected that mzML will replace the use of mzXML and mzData. For more information, visit the mzML development page. As of version 4.0, the TPP supports mzML through an interface between RAMP and Proteowizard.
Reference
- mzXML 2.0 format guide by Patrick Pedrioli: outdated but a very useful reference document. http://sashimi.sourceforge.net/schema_revision/mzXML_2.0/Doc/
- Pedrioli PGA, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti R, Apweiler R, Cheung K, Costello CE, Hermjakob H, Huang S, Julian RK Jr, Kapp E, McComb ME, Oliver SG, Omenn G, Paton NW, Simpson R, Smith R, Taylor CF, Zhu W, Aebersold R. (2004) "A Common Open Representation of Mass Spectrometry Data and its Application in a Proteomics Research Environment." Nature Biotechnology 22(11):1459-1466. Download PDF
- Learn more at the Sashimi project website, the original website for the mzXML format.