User:Darked~enwiki/ABRF 2005

From Wikipedia, the free encyclopedia

ABRF 2005 Savannah Feb 05-09th

Main topics:

  • Proteomics/ Mass spectrometry
  • Microarrays
  • DNA sequencing
  • Bioinformatics


Tutorials (Feb 05):

  1. Mascot (David Wishtar, UAlberta, Edmonton)
  2. Global Proteomic Machine / X!Tandem
  3. Sequest (Aaron Lucas )
  4. Spectrum Mill ( David Horn, Agilent)


Mascot

kinds of analyses:

  • PMT
  • Seq tag quering
  • MS/MS Ion searches

Price 7K$/1 CPU, 12.5K 2CPUs, down to 4K$CPU with large purchases Requirements: Linux (Windows) cluster/ recommended 2GB RAM / node

Other components:

  • Mascot Distiller
  • Mascot Deamon

Hints: - knowing estimated mass or isoelectric point helps - with Protein Fingerprintin do not use Swiss Prot -> use NR

Example files: http://gchelpdesk.ualberta.ca/ABRF2005/

Algorithm: Mowse scoring


Global Proteomic Machine / X!Tandem

  • thegpm.com

+ comments from Sunday Ron Beavis

Function:

  • IDs proteins from MS/MS data
  • permits point mutations!
  • Open source, Perl, Knoppix distro exists
  • Multithreded but also a version running on cluster (linux) in Kentucky
  • uses Open Mass Spectrometry Search Algorithm

Open Mass Spectrometry Search Algorithm Lewis Y. Geer, Sanford P. Markey, Jeffrey A. Kowalak, Lukas Wagner, Ming Xu, Dawn M. Maynard, Xiaoyu Yang, Wenyao Shi, and Stephen H. Bryant J. Proteome Res.; 2004; 3(5) pp 958 - 964; (Article) DOI: 10.1021/pr0499491


  • Uses database of reversed protein sequences to indicate getting into "bad matches area"
  • stores a database of real mass spec spectra (50 milions donated so far!) and one can compare these with actual results or a given peptide (if present)

Output of spectra as scalable vector graphic Common XML input output format

Sequest

In a standard relase:

  • requires Win2000 as a head node! even if cluster can be Windows head node/Linux slaves using PVM
  • Head 4-6 GB RAM
  • 5TB disks on head node
  • 32 CPUs in toital

big SRF files do not work on cluster

General impression: Works but it is a cludge. Creates bunch of (tens of thousands) small files (1-4kb) in single dir making backups/maintenance etc. very hard on OS.

FPGA containing Linux box (Sequest Sourcerer) -> rewritten algoritm/ fast from THERMAL

Data output: LCQ 19MB LTQ 100MB LTQ-F > 250MB /run

Other notes:

  • 75% peptides after tripsin /77% with perfect chemotrypsin digest are unique
  • some people claim that Sequest algorithm is superior in accuracy than mascot on anything longer than 9AA
  • exports XML/Excel files


Spectrum Mill

  • works only on Internet Explorer/ Server on Windows.
  • used mostly for de novo sequencing
  • not much comparative data



Mascot Integra Lab LIMS system based on Lab Vantage oand Oracle 9 using Phoreiix exchange format

Requirements:

  • dual 3.2 Xeons
  • Win Server 2003

Pricing: depends on number of concurent users / number of CPUs (29$K entry level)


BIND database from Blueprint.org

  • Manualy curated (27 Toronto + 12 Singapore) protein-protein interaction database (JAVA + MySQL)
  • BIND ids standarised accross Science, Nature and Cell jurnalls
  • introducing an idea of 'ontogliphs" a set of 84 squiggly characters used to represent major GO terms


Tutorial on protein alignment

by Kimmen Sjolander. Berkeley

programs to try and use:

  • DAPHNE -> no link so far
  • [BETE]


Machine of the Year 454 sequencer: http://www.454.com/

  • sequencing on microbeads thousands of small pieces at the same time
  • cost: 500K$ /5K per kit or ca 37K per service run
  • performance: up to 35MB of raw sequence per run!

Cons:

  • very short runs so far 100-160bp
  • intensograms instead of chromatograms
  • no phred compatible phred values/ different assembler needed