Proteomics - Carnegie Mellon School of Computer Science

Proteomics - Carnegie Mellon School of Computer Science

Proteomics and mass spectrometry Manimalha Balasubramani Outline Mass spectrometers Protein identification Quantitative proteomics Protein-protein interactions % Intensity 60 50 40 30 20 10 983.4860 1031.5374

1179.2 1475.7374 1559.4 1939.6 Mass (m/z) 2319.8 2518.1062 2439.0872 2393.0823 2211.0520 2169.9207 2120.9883 2021.9116

1964.8882 1848.9419 1800.9324 1258.5603 1303.7007 90 1360.7209 1395.7062 1114.5428 1153.5334 1191.6130 1232.5907 1254.5614 1280.5370 1315.5780 1035.5696 1074.5405

924.5113 965.4456 100 1593.7693 1586.8064 1630.7738 1657.7953 1689.7865 0 799.0 833.0566 842.4926 870.5201 878.4913 A mass spectrum 6.3E+4 80 70

2700.0 Basically measures mass Adapted from google Components Adapted from an Analytical chemistry textbook Ionization process MALDI ESI Matrix Assisted Laser Desorption Ionization ElectroSpray Ionization Nobel prize in Chemistry, 2002 MALDI Matrix Assisted Laser Desorption Ionization

ESI Electro Spray Ionization Mass analyzers several designs Adapted from Aebersold, R.; Mann, M. Nature 2003, 422, 198-207 GPCL inventory ABI Voyager DE PRO, walk-up use ABI 4700 Proteomics Analyzer Thermoelectron LCQ Deca with Surveyor HPLC ABI Qstar Elite with Ultimate 3000 HPLC Bruker micrOTOF with Ultimate 3000 HPLC Bruker 12 Tesla FTMS with Ultimate 3000 HPLC Time-of-flight (TOF) analyzers MALDI TOF Voyager DE PRO

ESI TOF Ultimate 3000 with micrOTOF MALDI TOF - principle KE = zeV = 1mv2 2 MS of serum albumin ESI TOF MALDI TOF Tandem mass spectrometer MALDI TOF/TOF MS and MS/MS Ion Trap MS, MS2, MS3, .MSn Quadrupole-q-TOF

ESI QqTOF installation phase. FT MS bottom line ..Resolution and mass accuracy FWHM Full width at half maxima of a peak Resolution and mass accuracy m measured at 50% peak height is the Full Width at Half Maxima (FWHM) R= M m R

M m = resolution = mass of the peak of interest = width in daltons of the peak Mass accuracy is measured as parts per million value ppm = 106m = 106 M R outline Mass spectrometers Protein identification Quantitative proteomics Protein-protein interactions Peptide Mass Fingerprinting - PMF Database entry NCBI

From: http://gobi.ym.edu.tw/course/mass/2004-0325.pdf Informatics Search engines Mascot, Matrix Science Sequest, Thermoelectron Free-ware Protein prospector (http://prospector.ucsf.edu/) TPP tools (http://tools.proteomecenter.org/TPP.php) Database searching using

MASCOT Overview of the experiment Submission of data to MASCOT webserver 1D SDS PAGE of proteins Adapted from Aebersold, R.; Mann, M. Nature 2003, 422, 198-207 10 0 699.0 20 1081.5479 1159.2 1249.6954 1399.7751 1554.7437

1619.4 Mass (m/z) Mass to charge ratio (m/z) 2079.6 2555.2903 2493.3501 2458.3052 2262.0557 1895.0386 1763.7820 1730.7723 1687.8691 1640.0277

1590.8619 2045.1273 1881.0223 1724.9272 1567.8276 1479.8824 100 1516.7135 80 1439.8967 90 1433.8074 40

1305.7888 30 1283.7881 60 1163.7000 70 1195.6243 1121.5520 927.5582 50 1014.6827 898.5428

841.5205 789.5378 %Intensity Intensity Mass spectrum 4700 Reflector Spec #1 MC=>TR[BP = 1479.9, 15779] 1.6E+4 2539.8 3000.0 Peak list Compiled from the mass spectra

Mass list Mass list and intensity Submitted to the search engine http:// www.matrixscience.com/ Mascot scoring A frequency factor matrix, F, is created, in which each row represents an interval of 100 Da in peptide mass, and each column an interval of 10 kDa in intact protein mass. As each sequence entry is processed, the appropriate matrix elements fi,j are incremented so as to accumulate statistics on the size distribution of peptide masses as a function of protein mass. The elements of F are then normalised by dividing the elements of each 10 kDa column by the largest value in that column to give the Mowse factor matrix M: After searching the experimental mass values against a calculated peptide mass database, the score for each entry is calculated according to: Where MProt is the molecular weight of the entry and the product term is calculated from the Mowse factor elements for each match between the experimental data and peptide masses calculated from the entry.

List of common contaminants Trypsin autolysis peptides Matrix peaks Keratin from skin, hair Other contaminants Protein Identification Adapted from Aebersold, R.; Mann, M. Nature 2003, 422, 198-207 Tandem mass spectrum http://qbab.aber.ac.uk Tandem mass spectrum 4700 MS/MS Precursor 1570.7 Spec #1 MC[BP = 175.1, 3106] 175.1326 100 3105.9

90 1056.5107 80 70 1554.7853 1571.9679 684.3845 %Intensity 60 1556.5172 50 40 112.0977 30

1558.4042 246.1672 20 120.0979 10 72.1029 0 69.0 333.2105 316.1747 229.1560 813.4371 1441.7213 480.2749 463.2531 400.2173 490.3423 386.8

741.3559 758.3326 627.3450 629.3128 837.0470 704.6 942.4836 910.8679 1039.4810 1040.9976 1022.4 Mass (m/z) 1171.5131 1268.5427 1340.2

1445.2834 1559.9417 1570.2634 1551.7002 1658.0 Tandem mass spectra (MS/MS) can be used for peptide sequencing Database Searching Peptide Mass Fingerprinting Sequence tag approach De novo sequencing inspect raw data http://qbab.aber.ac.uk Mascot Search Results Search title : SampleSetID: 362, AnalysisID: 567, MaldiWellID: 15790, SpectrumID: 17225, Path=\Mani\102004\New Analysis 1

Database : NCBInr 20040606 (1846720 sequences; 611532004 residues) Timestamp : 20 Oct 2004 at 14:52:50 GMT Top Score : 681 for gi|180570, creatine kinase [Homo sapiens] Probability Based Mowse Score Score is -10*Log(P), where P is the probability that the observed match is a random event. Protein scores greater than 75 are significant (p<0.05). Top hits from Mascot Search there are multiple accession numbers for the same protein Accession 1. gi|180570 2. gi|21536286 3. gi|33304149 4. gi|125292 5. gi|180572 6. gi|125295 7. gi|180555 8. gi|203476

9. gi|31542401 10. gi|203474 11. gi|40807002 12. gi|47477783 13. gi|13096153 14. gi|12852054 15. gi|10946574 16. gi|47213348 17. gi|627264 18. gi|27503418 19. gi|45384340 20. gi|6573489 Mass Score 42591 42617 42730 42674 42658 42636 42460 40598

42685 42699 44540 44782 42551 42700 42686 42953 40353 42214 42844 42713 Description 681 creatine kinase [Homo sapiens] 681 brain creatine kinase; creatine kinase-B [Homo sapiens] 681 creatine kinase, brain [synthetic construct] 568 CREATINE KINASE, B CHAIN (B-CK) [Cannis familiaris] 538 creatine kinase-B 514 CREATINE KINASE, B CHAIN (B-CK) 507 creatine kinase-B 473 creatine kinase-B 471 creatine kinase, brain [Rattus norvegicus] 471 creatine kinase

469 Unknown (protein for IMAGE:5598839) [Rattus norvegicus] 469 Ckb protein [Rattus norvegicus] 441 Chain A, Crystal Structure Of Bovine Retinal Creatine Kinase 427 unnamed protein product [Mus musculus] 427 creatine kinase, brain [Mus musculus] 237 unnamed protein product [Tetraodon nigroviridis] 236 creatine kinase (EC 2.7.3.2) isozyme IV - African clawed frog 235 Ckb-prov protein [Xenopus laevis] 209 B-creatine kinase [Gallus gallus] 201 Chain A, Crystal Structure Of Chicken Brain-Type Creatine Kinase Search returns a cluster of proteins with the same matching peptides 1. gi|180570 Observed 1232.62 1232.62 1254.57 1303.70 1303.70 1458.70 1586.81 1586.81

1656.79 1657.80 1657.80 1848.94 1864.93 1964.88 1964.88 2120.98 2120.98 2169.91 2225.06 2439.08 2439.08 2518.10 2518.10 3753.61 3753.61 Mr(expt) 1231.61 1231.61 1253.56 1302.70 1302.70

1457.69 1585.80 1585.80 1655.79 1656.79 1656.79 1847.93 1863.92 1963.88 1963.88 2119.97 2119.97 2168.91 2224.05 2438.07 2438.07 2517.09 2517.09 3752.60 3752.60 4. gi|125292 Observed 1254.57

1303.70 1303.70 1458.70 1586.81 1586.81 1624.76 1848.94 1864.93 1964.88 1964.88 2120.98 2120.98 2169.91 2225.06 2439.08 2439.08 2518.10 2518.10 3753.61 3753.61 Mr(expt) 1253.56 1302.70

1302.70 1457.69 1585.80 1585.80 1623.75 1847.93 1863.92 1963.88 1963.88 2119.97 2119.97 2168.91 2224.05 2438.07 2438.07 2517.09 2517.09 3752.60 3752.60 Mass: 42591 Mr(calc) 1231.61 1231.61

1253.58 1302.72 1302.72 1457.67 1585.83 1585.83 1655.82 1656.83 1656.83 1847.97 1863.97 1963.92 1963.92 2120.02 2120.02 2168.96 2224.17 2438.14 2438.14 2517.16 2517.16 3752.73 3752.73

Delta 0.00 0.00 -0.02 -0.02 -0.02 0.02 -0.03 -0.03 -0.03 -0.04 -0.04 -0.04 -0.04 -0.05 -0.05 -0.05 -0.05 -0.05 -0.12 -0.07 -0.07 -0.07 -0.07

-0.13 -0.13 Mass: 42674 Mr(calc) 1253.58 1302.72 1302.72 1457.67 1585.83 1585.83 1623.85 1847.97 1863.97 1963.92 1963.92 2120.02 2120.02 2168.96 2224.17 2438.14 2438.14 2517.16 2517.16

3752.73 3752.73 Delta -0.02 -0.02 -0.02 0.02 -0.03 -0.03 -0.10 -0.04 -0.04 -0.05 -0.05 -0.05 -0.05 -0.05 -0.12 -0.07 -0.07 -0.07 -0.07 -0.13

-0.13 Score: 681 Start 87 87 97 33 33 139 157 157 367 224 224 342 342 321 321 320 320 14 157 12

12 108 108 97 97 - End 96 96 107 43 43 151 172 172 381 236 236 358 358 341 341

341 341 32 177 32 32 130 130 130 130 Score: 568 Start 97 33 33 139 157 157 367 342 342 321 321

320 320 14 157 12 12 108 108 97 97 - End 107 43 43 151 172 172 381 358 358 341

341 341 341 32 177 32 32 130 130 130 130 Miss 0 0 0 0 0 1 0 0 0 0 0

0 0 0 0 1 1 0 1 1 1 0 0 1 1 creatine kinase [Homo sapiens] Ions 45 ---------54 ---81 ------47 ------------139 ---27 ------31

---92 ------55 Peptide DLFDPIIEDR DLFDPIIEDR HGGYKPSDEHK VLTPELYAELR VLTPELYAELR GFCLPPHCSRGER LAVEALSSLDGDLAGR LAVEALSSLDGDLAGR LEQGQAIDDLMPAQK TFLVWVNEEDHLR TFLVWVNEEDHLR LGFSEVELVQMVVDGVK LGFSEVELVQMVVDGVK GTGGVDTAAVGGVFDVSNADR GTGGVDTAAVGGVFDVSNADR RGTGGVDTAAVGGVFDVSNADR RGTGGVDTAAVGGVFDVSNADR FPAEDEFPDLSAHNNHMAK LAVEALSSLDGDLAGRYYALK LRFPAEDEFPDLSAHNNHMAK

LRFPAEDEFPDLSAHNNHMAK TDLNPDNLQGGDDLDPNYVLSSR TDLNPDNLQGGDDLDPNYVLSSR HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR CREATINE KINASE, B CHAIN (B-CK) Miss 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1

1 1 0 0 1 1 Ions ------54 ---81 ---------------139 ---27 ------31 ---92 ------55 Peptide HGGYKPSDEHK VLTPELYAELR VLTPELYAELR GFCLPPHCSRGER LAVEALSSLDGDLAGR LAVEALSSLDGDLAGR LEQGQAIDDLVPAQK

LGFSEVELVQMVVDGVK LGFSEVELVQMVVDGVK GTGGVDTAAVGGVFDVSNADR GTGGVDTAAVGGVFDVSNADR RGTGGVDTAAVGGVFDVSNADR RGTGGVDTAAVGGVFDVSNADR FPAEDEFPDLSAHNNHMAK LAVEALSSLDGDLAGRYYALK LRFPAEDEFPDLSAHNNHMAK LRFPAEDEFPDLSAHNNHMAK TDLNPDNLQGGDDLDPNYVLSSR TDLNPDNLQGGDDLDPNYVLSSR HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR HGGYKPSDEHKTDLNPDNLQGGDDLDPNYVLSSR Creatine kinase B is the highest scoring protein Match to: gi|21536286 ; Score: 681 Nominal mass (Mr): 42591; Calculated pI Creatine kinase

- B [Homo sapiens] value: 5.34 Observed Mass & pI: 43kd, 6.2-6.27 Sequence Coverage: 46% 1 MPFSNSHNAL KLRFPAEDEF PDLSAHNNHM AKVLTPELYA ELRAKSTPSG 51 FTLDDVIQTG VDNPGHPYIM TVGCVAGDEE SYEVFKDLFD PIIEDRHGGY 101 KPSDEHKTDL NPDNLQGGDD LDPNYVLSSR VRTGRSIRGF CLPPHCSRGE 151 RRAIEKLAVE ALSSLDGDLA GRYYALKSMT EAEQQQLIDD HFLFDKPVSP 201 LLSASGMARD WPDARGIWHN DNKTFLVWVN EEDHLRVISM QKGGNMKEVF 251 TRFCTGLTQI ETLFKSKDYE FMWNPHLGYI LTCPSNLGTG LRAGVHIKLP 301 NLGKHEKFSE VLKRLRLQKR GTGGVDTAAV GGVFDVSNAD RLGFSEVELV 351 QMVVDGVKLL IEMEQRLEQG QAIDDLMPAQ K outline Mass spectrometers Protein identification Quantitative proteomics Protein-protein interactions Quantitative Proteomics Sample preparation

From 2D gels .to MALDI or ESI MS Control Test Pool Cy3 Cy5 Image analysis with Delta2D, Decodon Quantitate Export spot list to robotic picker ..its high-throughput 1st Dimension - Isoelectric focussing 2nd Dimension SDS PAGE Spot picking Trypsin gel digest

Colorectal cancer markers Isolate Nuclear Matrix Mass spectral analysis MS In-gel Tryptic digest m/z MS/MS Database Search Tumor specific markers CC3, CC4, CC5, CC6a, CC6b m/z 2D 1

D Immunoblotting Immunohistochemistry Protein Identified Validation Yes No de novo sequencing Balasubramani et al., Cancer Res., 2006 Shotgun proteomics Adapted from Aebersold, R.; Mann, M. Nature 2003, 422, 198-207 typical workflow to identify biomarkers that distinguish indolent versus aggressive forms of cancer..

Group A, Indolent Group B, Aggressive Fractionate Fractionate Eg. Immunodeplete, subcellular Eg. Immunodeplete, subcellular Tryptic peptides Tryptic peptides Label with iTRAQ reagent 115 Label with iTRAQ reagent 116 Combine labeled digests LC fractionate MS and MS/MS Protein ID and Quantitate

Sample handling In-solution Isoelectric focussing HPLC 1D or 2D LC MALDI Protein-protein interaction studies Immunoaffinity pull-downs Tandem affinity purification GPCL Billy W Day Paul Wood Mirunalni Thangavelu Tamanna Sultana Emanuel M Schreiber Chris Bolcato Chris Myers Patrick Miller

Robert Wolfe definitions The amu is defined as 1/12th the mass of one neutral 6C12 atom Amu is also called the dalton 1 amu =1/12 ( 12g 12C/mol 12C 6.0221 x 1023 atoms 12C 1.6605 x 10-24 g/atom 12C C/mol 12 Isotopic species of M (M + H)+

(M + 1H)/1H+ (M + 2H)2+ (M + 2H)/2H+ (M + 3H)3+ (M + 3H)/3H+

Recently Viewed Presentations