Nonparametric methods for identifying differentially expressed genes in microarray data. Author Olga Troyanskaya, Mitchell Garber, Patrick Brown, David Botstein, Russ Altman Publication Year 2002 Type Journal Article Abstract MOTIVATION: Gene expression experiments provide a fast and systematic way to identify disease markers relevant to clinical care. In this study, we address the problem of robust identification of differentially expressed genes from microarray data. Differentially expressed genes, or discriminator genes, are genes with significantly different expression in two user-defined groups of microarray experiments. We compare three model-free approaches: (1). nonparametric t-test, (2). Wilcoxon (or Mann-Whitney) rank sum test, and (3). a heuristic method based on high Pearson correlation to a perfectly differentiating gene ('ideal discriminator method'). We systematically assess the performance of each method based on simulated and biological data under varying noise levels and p-value cutoffs.RESULTS: All methods exhibit very low false positive rates and identify a large fraction of the differentially expressed genes in simulated data sets with noise level similar to that of actual data. Overall, the rank sum test appears most conservative, which may be advantageous when the computationally identified genes need to be tested biologically. However, if a more inclusive list of markers is desired, a higher p-value cutoff or the nonparametric t-test may be appropriate. When applied to data from lung tumor and lymphoma data sets, the methods identify biologically relevant differentially expressed genes that allow clear separation of groups in question. Thus the methods described and evaluated here provide a convenient and robust way to identify differentially expressed genes for further biological and clinical analysis. Keywords Models, Genetic, Humans, Gene Expression Profiling, Gene Expression, Models, Statistical, Oligonucleotide Array Sequence Analysis, Sequence Alignment, Sequence Analysis, DNA, Computer Simulation, Reproducibility of Results, Statistics, Nonparametric, Sensitivity and Specificity, Lymphoma, B-Cell, Carcinoma, Squamous Cell, False Positive Reactions, Lung Neoplasms, Reference Values Journal Bioinformatics Volume 18 Issue 11 Pages 1454-61 Date Published 11/2002 Alternate Journal Bioinformatics Google ScholarBibTeXEndNote X3 XML