displayHTS: an R package for displaying data and results from ...

displayHTS: an R package for displaying data and results from ...

displayHTS: An R package for displaying data and resu lts from high-throughput screening experi ments Xiaohua Douglas Zhang Head, Early Development Statistics Asian Pacific BARDS Merck Research Laboratories May 18, 2013 1 Outline Background knowledge for the R package Basic drug discovery & development process High-throughput screening Brief description of our R-package displayHTS Main functions in the package plateWellSeries.fn image.design.fn image.intensity.fn dualFlashlight.fn An Example Summary Drug Discovery & Development Process Drug Discovery (e.g., ) Target Discovery (e.g., ) Introduction Pre-clinical (safety & Phase I / II drug metabolism) Phase III Phase IV

(Registration & Pharmacovigilance) FDA Approval Drug Discovery Using High-Throughput Biote chnologies High-throughput biotechnologies High-throughput screening (HTS) A book having already been published for HTS A book Statistical Omics to be under contrac t Cell of Interest Library Transfection High Throughput Screen Treatment Scanning Numeric Data Statistical Analysis Genes Identification Or Therapeutic Target HTS Project and Data An HTS project may contain one primary screen with millions of compounds with no re plicate one confirmatory screen with replicates The measured response is usually the intensity emitt ed by labeled particles such as fluorescent dyes. Need to display data and results R package displayHTS to serve the need R Package: displayHTS freely available from CRAN: http:// cran.r-project.org/mirrors.html displayHTS has four main functions: plateWellSeries.fn

image.design.fn image.intensity.fn dualFlashlight.fn plateWellSeries.fn() library(displayHTS) data(HTSdataSort) wells = as.character(unique(HTSdataSort[, "WELL_USAGE"])) colors = c("black", "pink", "grey", "blue", "skyblue", "green", "red") orders=c(1, 3, 2, 4, 5, 7, 6) par( mfrow=c(1,1) ) plateWellSeries.fn(data.df = HTSdataSort[1:(384*2),], intensityName="log2Intensity", plateName="BARCODE", wellName="WELL_USAGE", rowName="XPOS", colName="YPOS", show.wellTypes=wells, order.wellTypes=orders, color.wells=colors, pch.wells=rep(1, 7), ppf=6, byRow=TRUE, yRange=NULL, cex.point=0.75,cex.legend=0.75, main="A: Plate-well series plot") A: Plate-well series plot 23 21 2: PL000002 20 1: PL000001 log2Intensity 22 mock1 Sample mock2 posCTRL3 posCTRL2 negCTRL posCTRL1 Zhangs Book imageDesign.fn() data(HTSresults) condtSample = HTSresults[, "WELL_USAGE"] == "Sample" condtUp = HTSresults[,"ssmd"] >= 1 & HTSresults[,"mean"] >= log2(1.2) condtDown = HTSresults[,"ssmd"] <= -1 & HTSresults[,"mean"] <= -log2(1.2)

sum(condtSample & (condtUp | condtDown) )/sum(condtSample) hit.vec = as.character(HTSresults[, "WELL_USAGE"]) hit.vec[ condtSample & condtUp ] = "up-hit" hit.vec[ condtSample & condtDown ] = "down-hit" hit.vec[ condtSample & !condtUp & !condtDown] = "non-hit" result.df = cbind(HTSresults, "hitResult"=hit.vec) wells = as.character(unique(result.df[, "hitResult"])); wells colors = c("black", "green", "white", "grey", "red", "purple1", "purple2", "pink", "purple3") par( mfrow=c(1,1) ) imageDesign.fn(result.df[1:384,], wellName="hitResult", rowName="XPOS", colName="YPOS", wells=wells, colors=colors, title="B: Image of hits and controls") 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7

6 5 4 3 2 1 B: Image of hits and controls 1 2 3 4 mock1 5 down-hit 6 7 8 non-hit mock2 up-hit 9 10 11 posCTRL3 posCTRL2 12 negCTRL 13 posCTRL1 14

15 16 imageIntensity.fn() imageIntensity.fn(HTSdataSort[1:384,], intensityName="log2Inte nsity", plateName="BARCODE", wellName="WELL_USAGE", rowName="XPOS", colName="YPOS", sampleName="S ample", sourcePlateName="SOBARCODE") 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 21.19 1 2 3 4 5 6 7 8 9 10 11 12

13 14 15 16 21.11 21.04 20.96 20.89 - 20.81 20.74 20.66 20.58 20.51 20.43 20.36 20.28 20.20 20.13 20.05 19.98 19.90 - - SO000001 - PL000001 19.83 19.75 19.67 19.60 An ApoA1 siRNA Confirmatory Screen A3: Adjusted data in a plate 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Others Sample Negative Inhibition 1 2 3 4 5 6 7 8 9 10 11 12

13 14 15 16 + + + + + + - - - - - 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A2: Raw data in a plate

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A1: Plate design 1512.14 1480.71 1449.29 1417.87 1386.44 1355.02 1323.60 1292.18 1260.75 1229.33 1197.91 1166.48 1135.06 1103.64 1072.21 1040.79 1009.37 977.94 946.52 915.10 883.67 852.25

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 B1: Raw Data + + + + + - + - - + 1596.48 1555.24 1513.99 1472.75 1431.50 1390.25 1349.01 1307.76 1266.51 1225.27 1184.02 1142.77 1101.53 1060.28

1019.03 977.79 936.54 895.29 854.05 812.80 771.55 730.31 B2: Adjusted Data 2000 Adjusted Intensity 2000 1500 Raw Intensity 1500 1000 1000 500 Plate Number (Plate-well series) Plate Number (Plate-well series) J. Biomol. Screen 2008 13:378-389 20 19 18 17 16 20 19 18

17 0 16 0 500 An ApoA1 siRNA Confirmatory Screen A3: Adjusted data in a plate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 9 10 11

12 13 14 15 16 Others Sample Negative Inhibition 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 + + + + + + - - - - - 1 2 3

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A2: Raw data in a plate 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A1: Plate design

1512.14 1480.71 1449.29 1417.87 1386.44 1355.02 1323.60 1292.18 1260.75 1229.33 1197.91 1166.48 1135.06 1103.64 1072.21 1040.79 1009.37 977.94 946.52 915.10 883.67 852.25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 B1: Raw Data + + + + + -

+ - - + 1596.48 1555.24 1513.99 1472.75 1431.50 1390.25 1349.01 1307.76 1266.51 1225.27 1184.02 1142.77 1101.53 1060.28 1019.03 977.79 936.54 895.29 854.05 812.80 771.55 730.31 B2: Adjusted Data 2000 Adjusted Intensity 2000 1500 Raw Intensity 1500 1000 1000

500 Plate Number (Plate-well series) Plate Number (Plate-well series) J. Biomol. Screen 2008 13:378-389 20 19 18 17 16 20 19 18 17 0 16 0 500 An ApoA1 siRNA Confirmatory Screen A3: Adjusted data in a plate 1 2 3 4 5 6 7 8 9 10 11 12

13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Others Sample Negative Inhibition 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

16 + + + + + + - - - - - 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A2: Raw data in a plate 1 2 3

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 A1: Plate design 1512.14 1480.71 1449.29 1417.87 1386.44 1355.02 1323.60 1292.18 1260.75 1229.33 1197.91 1166.48 1135.06 1103.64 1072.21 1040.79 1009.37 977.94 946.52 915.10 883.67 852.25 1 2

3 4 5 6 7 8 9 10 11 12 13 14 15 16 B1: Raw Data + + + + + - + - - + 1596.48 1555.24 1513.99 1472.75 1431.50 1390.25 1349.01 1307.76 1266.51 1225.27 1184.02 1142.77 1101.53 1060.28 1019.03 977.79 936.54

895.29 854.05 812.80 771.55 730.31 B2: Adjusted Data 2000 Adjusted Intensity 2000 1500 Raw Intensity 1500 1000 1000 500 Plate Number (Plate-well series) Plate Number (Plate-well series) J. Biomol. Screen 2008 13:378-389 20 19 18 17 16 20 19 18 17

0 16 0 500 dualFlashlight.fn() for Generating a Dual-Flashlight Plot par( mfrow=c(1, 1) ) dualFlashlight.fn(HTSresults, wellName="WELL_USAGE", x.name="mean", y.name="ssmd", sampleName="Sample", sampleColor="black", controls = c("negCTRL", "posCTRL1", "mock1"), controlColors = c("green", "red", "lightblue"), xlab="Average Fold Change", ylab="SSMD", main="C: Dual-Flashlight Plot", x.legend=0.1, y.legend= -12, cex.point=1, cex.legend=0.8, xat=log2( c(1/4, 1/2, 1/1.2, 1,1.2,2,4) ), xMark=c("1/4", "1/2", "1/1.2","1", "1.2", "2", "4"), xLines=log2( c(1/4, 1/2, 1/1.2, 1, 1.2, 2, 4) ), yLines=c(-5, -3, -2, -1, 0, 1, 2, 3, 5 ) ) C: Dual-Flashlight Plot 5 SSMD 0 -5 -10 Sample negCTRL posCTRL1 mock1 -15 -20 1/2 1/1.2 Average Fold Change 1 1.2 dualFlashlight.fn() for Generating a Volcano Plot result.df = cbind(HTSresults, "neg.log10.pval" = -log10(HTSresults[,"p.value"]))

dualFlashlight.fn(result.df, wellName="WELL_USAGE", x.name="mean", y.name="neg.log10.pval", sampleName="Sample", sampleColor="blac k", controls = c("negCTRL", "posCTRL1", "mock1"), controlColors = c("green", "red", "lightblue"), xlab="Average Fold Change", ylab="p-value in -log10 scale", main="D: Volcano Plot", x.legend=NA, y.legend=-log10(0.006), cex.point=1, cex.legend=0.8, xat=log2( c(1/4, 1/2,1/1.2,1,1.2,2, 4) ), xMark=c("1/4", "1/2", "1/1.2","1", "1.2", "2", "4"), xLines=log2( c(1/4, 1/2, 1/1.2, 1, 1.2, 2, 4) ), yLines=c(-5, -3, -2, -1, 0, 1, 2, 3, 5 ) ) D: Volcano Plot p-value in -log10 scale 6 4 2 Sample negCTRL posCTRL1 mock1 0 1/2 1/1.2 Average Fold Change 1 1.2 An Example in Drug Discovery New Technology for drug discovery: RNA interference high-throughput screening RNAi HTS for HIV: Zhou H, Xu M, Huang Q, Gates AT, Zhang XHD, Stec E M, Ferrer M, Hazuda DJ, Espeseth AS. 2008. Genomescale RNAi screen for host factors required for HIV re plication. Cell Host & Microbe 4(5):495-504 listed by Nature Medicine in their year end review o n Notable advances in 2008 Summary Knowledge about drug R&D is important

HTS is a critical biotechnology for drug R&D displayHTS can display HTS data and results plateWellSeries.fn(): display data and results plate-by pla te and well-by-well image.design.fn(): display the position of control types a nd result categories image.intensity.fn(): display data and results by imaging dualFlashlight.fn(): display calculated results such as SS MD and p-value References for Data Analysis in HTS (2006 2007) 1. 2. 3. 4. 5. 6. 7. 8. Zhang XHD, Yang XC, Chung N, Gates AT, Stec EM, Kunapuli P, Holder DJ, Ferrer M, Espeseth AS. 2006. Robust statistical methods for hit selection in RNA interference high throughput screening experimen ts. Pharmacogenomics 7 (3) 299-309 Espeseth AS, Huang Q, Gates AT, Xu M, Yu Y, Simon AJ, Shi X, Zhang XHD, Hodor PG, Stone D, Burchar d J, Cavet GL, Bartz S, Linsley PS, Ray WJ, Hazuda DJ. 2006. A genome wide analysis of ubiquitin ligase s in APP processing identifies a novel regulator of BACE1 mRNA levels. Molecular and Cellular Neuros cience 33(3): 227-235. Zhang XHD, Espeseth AS, Chung N, Holder DJ, Ferrer M. 2006. The use of strictly standardized mean d ifference for quality control in RNA interference high throughput screening experiments. The 2006 A merican Statistical Association Proceedings, Alexandria, VA: American Statistical Association: 882-886 Zhang XHD, Espeseth AS, Chung N, Ferrer M. 2006. Evaluation of a novel metric for quality control in an RNA interference high throughput screening assay. BIOCOMP:385-390. Zhang XHD. 2007. Threshold determination of strictly standardized mean difference in RNA interferen ce high throughput screening assays. IMECS Proceeding: 261-266 Zhang XHD, Ferrer M, Espeseth AS, Marine SD, Stec EM, Crackower MA, Holder DJ, Heyse JF, Strulovic i B. 2007. The use of strictly standardized mean difference for hit selection in primary RNA interferenc e high throughput screening experiments. Journal of Biomolecular Screening 12 (4): 497-509 Zhang XHD. 2007. A new method with flexible and balanced control of false negatives and false positi ves for hit selection in RNA interference high throughput screening assays. Journal of Biomolecular Sc reening 12 (5): 645-655 Zhang XHD. 2007. A pair of new statistical parameters for quality control in RNA interference high thr oughput screening assays. Genomics 39: 552-561. References (2008 - 2009) 9. 10.

11. 12. 13. 14. 15. 16. 17. 18. Zhang XHD, Kuan PF, Ferrer M, Shu X, Liu YC, Gates AT, Kunapuli P, Stec EM, Xu M, Marine SD, Holder DJ, Stulovici B, Heyse JF, Espeseth AS. 2008. Hit selection with false discovery rate control in genome-s cale RNAi screens. Nucleic Acids Research 36 (14):4667-4679. Zhang XHD, Espeseth AS, Johnson E, Chin J, Gates A, Mitnaul L, Marine SD, Tian J, Stec EM, Kunapuli P , Holder DJ, Heyse JF, Stulovici B, Ferrer M. 2008. Integrating experimental and analytic approaches t o improve data quality in genome-wide RNAi screens. Journal of Biomolecular Screening 13(5): 378-3 89. Zhang XHD, 2008. Novel analytic criteria and effective plate designs for quality control in genome-wid e RNAi screens. Journal of Biomolecular Screening 13(5): 363-377. Zhang XHD. 2008. Genome-wide screens for effective siRNAs through assessing the size of siRNA effe cts. BMC Research Notes 1:33. Chung K, Zhang XHD, Kreamer A, Locco L, Kuan PF, Bartz S, Linsley PS, Ferrer M, Strulovici B. 2008. Me dian absolute deviation to improve hit selection for genome-scale RNAi screens. Journal of Biomolecu lar Screening 13: 149-158. Zhou H, Xu M, Huang Q, Gates AT, Zhang XHD, Stec EM, Ferrer M, Hazuda DJ, Espeseth AS. 2008. Gen ome-scale RNAi screen for host factors required for HIV replication. Cell Host & Microbe 4(5):495-504. Zhang XHD, Shane SD, Ferrer M. 2009. Error rates and power in genome-scale RNAi screens Journal o f Biomolecular Screening 14: 230-238. Zhang XHD. 2009. A method effectively comparing gene effects in multiple conditions in RNAi and ex pression profiling research. Pharmacogenomics 10: 345-358 Zhang XHD, Heyse JF. 2009. Determination of sample size in genome-scale RNAi screens. Bioinformati cs 25:841-844 Klinghoffer RA, Frazier J, Annis J, Berndt JD, Roberts BS, Arthur WT, Lacson R, Zhang XHD, Ferrer M, Moon, RT, Cleary MA. 2009. A lentivirus-mediated genetic screen identifies dihydrofolaste reductase (DHFR) as a modulator of -actenin/GSK3 signaling. PLoS ONE 4(9): e6892 References (2010) 19. 20. 21. 22. 23. 24. 25. Zhang XHD. 2010. Assessing the size of gene or RNAi effects in multi-factor highthroughput experiments. Pharmacogenomics 11(2): 199 - 213 Zhang XHD. 2010. Strictly standardized mean difference, standardized mean diff erence and classical t-test for the comparison of two groups. Statistics in Biopharmaceutical Research 2(2): 292-299

Zhang XHD. 2010. A statistical method assessing collective activity of multiple siR NAs targeting a gene in RNAi screens. The 2010 American Statistical Association Proceedings [CD-ROM], Alexandria, VA: American Statistical Association. Zhang XHD. 2010. An effective method controlling false discoveries and false no n-discoveries in genome-scale RNAi screens. Journal of Biomolecular Screening 1 5: 1116 1122 . Zhang XHD, Lacson R, Yang R, Marine SD, McCampbell, Toolan DM, Hare TR, Kajd as J, Holder DJ, Heyse JF, Ferrer M. 2010. The use of SSMD-based false discovery and false non-discovery rates in genome-scale RNAi screens Journal of Biomolec ular Screening 15: 1123 1131. Zhang XHD, 2010. Contrast variable potentially providing a consistent interpreta tion to effect sizes. Journal of Biometrics & Biostatitics 1:108 Zhao WQ, Santini F, Breese R, Ross D, Zhang XHD, Stone DJ, Ferrer M, Townsend M, Wolfe AL, Seager MA, Kinney GG, Shughrue PJ, Ray WJ. 2010. Inhibition of cal cineurin-mediated endocytosis and AMPA receptor prevent amyloid oligomer-in duced synaptic disruption. Journal of Biological Chemistry 285(10): 7619-7632 References (2011-2013) 26. Zhang XHD. 2011. Illustration of SSMD, z-score, SSMD*, z*-score and t-statistic for hit selec tion in high-throughput screens. Journal of Biomolecular Screening 16 (7): 775 - 785 . 27. Zhang XHD, Santini F, Lacson R, Marine SD, Wu Q, Benetti L, Yang R, McCampbell A, Berger J P, Toolan DM, Stec EM, Holder DJ, Soper KA, Heyse JF and Ferrer M. 2011. cSSMD: Assessing collective activity of multiple siRNAs in genome-scale RNAi screens. Bioinformatics 27(20): 2775-2781. 28. Zhang XHD, Heyse JF. 2012. Contrast variable for comparing groups in biopharmaceutical re search. Statistics in Biopharmaceutical Research 4 (3): 228 239. 29. Huang W, Zhang XHD, Yong Li, William W Wang, Keith Soper. 2012. Standardized median di fference for quality control in high-throughput screening. Proceedings of 2012 Internationa l Symposium on Information Technologies in Medicine and Education (ITME): 515 518. 30. Yang R, Lacson RG, Castriota G, Zhang XHD, Liu Y, Zhao WQ, Einstein M; Camargo, Luiz CM, Q ureshi S, Wong KK, Zhang BB, Ferrer M, Berger JP. 2012. A genome-wide siRNA screen to id entify modulators of insulin sensitivity and gluconeogenesis. PLoS ONE 7(5): e36384. 31. Zhang XHD, Zhang ZZ. 2013. displayHTS: a R package for displaying data and results from hi gh-throughput screening experiments. Bioinformatics 29 (6): 794796. 32. BOOK 1: Zhang XHD. Optimal High-Throughput Screening: Practical Experimental Design and Data Analysis for Genome-scale RNAi Research. 2011. Cambridge University Pre ss, Cambridge, UK (ISBN: 9780521734448). 33. BOOK 2: Zhang XHD, Heyse JF (editors). Statistics Omics. Under preparation to co me out in 2014. Chapman & Hall/CRC Press, California, USA.

Recently Viewed Presentations

  • Anatomy & Physiology SIXTH EDITION Chapter 28, part

    Anatomy & Physiology SIXTH EDITION Chapter 28, part

    Chapter 28, part 3 The Reproductive System SECTION 28-3 The Reproductive System of the Female Principle organs of the female reproductive system Ovaries Uterine tubes Uterus Vagina Support and stabilization Ovaries, uterine tubes and uterus enclosed within broad ligament Mesovarium...
  • Background to The Great Gatsby - Weebly

    Background to The Great Gatsby - Weebly

    Chapter 5 is a crucial chapter of The Great Gatsby, as Gatsby's reunion with Daisy is a central part of the novel. After Gatsby's history with Daisy is revealed, a meeting between the two becomes inevitable, and as the novel...
  • Section 4.4

    Section 4.4

    Postulate 19. Angle-Side-Angle (ASA) Congruence Postulate ... If so, state the postulate or theorem you would use. Explain your reasoning. Example 2: Is it possible to prove these triangles are congruent? If so, state the postulate or theorem you would...
  • 53:071 Principles of Hydraulics Laboratory Experiment #1 ...

    53:071 Principles of Hydraulics Laboratory Experiment #1 ...

    53:071 Principles of Hydraulics Laboratory Experiment #1 Energy and Hydraulic Grade Lines in Water Pipe Systems Li-Chuan Chen, Marian Muste, and Larry Weber Objective To determine the energy and hydraulic grade lines in a pipeline assembly comprising losses due to...
  • AP World History: Globalization

    AP World History: Globalization

    What are the pros and cons of spreading democracy? Where has the push of democracy failed? What is the role of the UN? What are the main points in the UN's Declaration of Human Rights? Can the UN enforce them?...
  • Chapter One - mohamadtermos.weebly.com

    Chapter One - mohamadtermos.weebly.com

    Cellular Respiration Chapter 8 BIOL1000 Dr. Mohamad H. Termos * Cellular Respiration Breaks down nutrient molecules for the production of ATP Consumes oxygen and produces carbon dioxide (CO2) Aerobic - Usually involves the breakdown of glucose to CO2 and H2O...
  • This is the Title - HPCx

    This is the Title - HPCx

    CFD shows a self sustaining cycle LES sub-grid reaction rate LES modelling test case close-up view DNS in support of modelling Conclusions RANS combustion modelling is highly developed - remains valuable for industrial applications - offers a high level of...
  • Biological Clean &amp; Treat Solutions

    Biological Clean & Treat Solutions

    Adrian Shuker, Sustainability Director for OCS who provide cleaning services in the airport's public areas, commented: "It has been one of our top priorities to enhance the passenger experience, especially in the washroom, and to add value for the client...