Formats:mzXML

mzXML is an open data format for storage and exchange of mass spectroscopy data, developed at the SPC/Institute for Systems Biology. mzXML provides a standard container for ms and ms/ms proteomics data and is the foundation of our proteomic pipelines. Raw, proprietary file formats from most vendors can be converted to the open mzXML format.

Patrick Pedrioli was the primary original author; please see the references below for the often-cited Nature Biotech publication.

Several versions of this format exist. Currently these are 1.0 (also called "msXML"), 2.0, 2.1, 3.0, and 3.1. 3.1 is the current version. The offical XML Schema components for the format are available at mzXML schema.

Converters: status and summary table

All known formats and their converters are summarized in the table below:

Compatible Mass-Spectrometer Instrument-Specific Software and File Formats:

Compatible Mass-Spec instrument-specific file formats

raw file vendor

instrument acquisition software

raw file type

raw-to-mzXML converter

SPC maintained?

converter status/notes

download

Bruker

(?)

.baf

CompassXport page

no (Bruker supported)

refer to vendor; see CompassXport page

Thermo

XCalibur

.RAW file

ReAdW

yes, SPC

official release: 4.3.0, August 2009; see ReAdW page

Waters

MassLynx

.RAW directory

wolf

yes, SPC

no centroiding

offical release: 4.3.0, August 2009; see wolf page

MDS/Sciex for ABI and Agilent

Analyst, AnalystQS

.wiff file

mzWiff

yes, SPC

official release: 4.3.0, August 2009: see mzWiff page

Agilent

MassHunter

.d directory

trapper

yes, SPC

official release: 4.3.0, August 2009: see trapper page

ABI

(?)

(?)

T2DExtractor

no (Andrews Lab)

4700/4800 MALDI TOFTOF converter; known issue: correct instrument name is not recorded in mzXML; charge state info is not recorded in mzXML

link

ABI

(?)

(?)

PyMsXML

no

ABI Mariner, Voyager converter

external link

Other Compatible MS File Formats:

Other compatible MS file formats

format

original use

text-to-mzXML converter

SPC maintained?

converter status/notes

download

.dta

sequest

dta2mzXML

yes

in TPP release

refer to TPP page

.pkl

?

pkl2mzXML

yes

in TPP release

pkl2mzXML

.mgf

mascot

?

?

?

?

Converter Overview

The first step in processing ms or ms/ms data with SPC proteomics software is conversion of raw data to our open mzXML format.

direct vendor support

Bruker

Bruker directly supports the mzXML format, using their CompassXport program:

SPC/TPP supported proprietary formats

The TPP project supports a wide variety of mass-spec instrument formats. Currently, we provide software tools (converters) for the following vendor formats. Follow the links for more specific information for each converter:

Thermo/XCalibur
Analyst (ABI/MDS Sciex)
Waters Masslynx
Agilent MassHunter

Other SPC/TPP supported MS file formats

dta

The dta format is a simple text file describing precursor m/z and a peaklist table. This format was originally developed for Sequest. No annotation metadata is stored in this format, so you will loose information going from raw to dta then back to mzXML.

pkl

Command-line program to convert pkl files to mzXML files.

mgf

The mgf format is another simple text format developed for the Mascot search engine.

Other external converter projects

These projects are not supported by the SPC, but may be of use.

SCIEX/ABI 4700/4800 MALDI TOFTOF
ABI Mariner, Voyager

Converting mzXML to other formats

MzXML2Search

This program can convert to dta, mgf, pkl, and ms2 formats.

msconvert

This program can convert to many output formats, including mzML. Eventually, this program will be a general purpose converter, converting from all common raw formats into both mzML and mzXML. Here is a summary of what it can do and can't do in its installation at the Institute for Systems Biology.

Viewing mzXML files

Pep3D

The Pep3D program, included with the TPP, produces a gel-like image, which is very useful for getting an overview of an entire run. It can also link to CID and peptide probability information.

Insilicos viewer

We also recommend the free InSilicos Viewer, which can display mzXML, mzData, and other formats.

Additional Resources

Parsing mzXML data

RAMP

RAMP: C/C++ API, contained in TPP and basis for TPP tools. Also parses mzData.

JRAP: java API

Please note that while widely accepted in the proteomics community mzXML is format, not a standard. There are two other open ms/ms proteomic formats: mzData and mzML. mzData was the first attempt, created by the HUPO/PSI committee process. Many vendors wanted to wait until the format was finalized (a process which took two years); in the meantime, mzXML was developed to fill the need.

mzML is a new format released 6/2008 which aims to marry the best elements of mzXML and mzData, and represents a joint effort of the HUPO/PSI committee, SPC/ISB, instrument vendors, and other proteomics software groups. It is intended and expected that mzML will replace the use of mzXML and mzData. For more information, visit the mzML development page. As of version 4.0, the TPP supports mzML through an interface between RAMP and Proteowizard.

Reference