Multidimensional Visualization Tools for Analysis of Expression Data

Expression data analysis is based mostly on the statistical approaches that are indispensable for the study of biological systems. Large amounts of multidimensional data resulting from the high-throughput technologies are not completely served by biostatistical techniques and are usually complemented with visual, knowledge discovery and other computational tools. In many cases, in biological systems we only speculate on the processes that are causing the changes, and it is the visual explorative analysis of data during which a hypothesis is formed. We would like to show the usability of multidimensional visualization tools and promote their use in life sciences. We survey and show some of the multidimensional visualization tools in the process of data exploration, such as parallel coordinates and radviz and we extend them by combining them with the self-organizing map algorithm. We use a time course data set of transitional cell carcinoma of the bladder in our examples. Analysis of data with these tools has the potential to uncover additional relationships and non-trivial structures.




References:
[1] T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P.
Mesirov, H. Coller, M.L. Loh, J.R. Downing, et al. Molecular
classification of cancer: class discovery and class prediction by gene
expression monitoring. Science 286(5439), pp. 531-537, 1999.
[2] P.T. Spelman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B.
Eisen, P.O. Brown, D. Botstein, B. Fucher. Comprehensive
identification of cell-cycle regulated genes of the Yeast Saccharomyces
Cerevisiae by Microarray Hybridization. Molecular Biology of the Cell,
9(12), pp. 3273-3297, 1998.
[3] T. Zhang, R. Ramakrishnan, M. Livny. Birch: an efficient data
clustering method for very large databases. Proc.Int. Conf. Management
of Data, pp. 103-114, 1996.
[4] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E.
Dmitrovsky, E.S. Lander, T.R. Golub. Interpreting patterns of gene
expression with self-organizing maps: methods and application to
hematopoietic differentiation. Proc. Atl. Acad. Sci., 96(6), pp. 2907-
2912, 1999.
[5] P. Saraiya, C. North, K. Duca. An evaluation of microarray visualization
tools for biological insight. Proc. Information Visualization 2004, pp. 1-
8, 2004.
[6] G. Grinstein, M. Trutschl, U. Cvek, High-dimensional visualizations. 7th
ACM/SIGKDD Data mining Conference (KDD), 2001.
[7] T. Kohonen, Self-organized formation of topologically correct feature
maps. Biological Cybernetics, vol. 43, pp. 59-69, 1982.
[8] R. Stone II, A.L. Sabichi, J. Gill, I.Lee, R. Loganatharaj, M. Trutschl, U.
Cvek, J.L. Clifford. Identification of genes involved in early stage
bladder cancer progression. Unpublished.
[9] Z.T. Zhang, J. Pak, E. Shapiro, T.T. Sun, X.R. Wu. Urothelium-specific
expression of an oncogene in transgenic mice induced the formation of
carcinoma in situ and invasive transitional cell carcinoma. Cancer Res.,
59(14), pp. 3512-7, 1999.
[10] R. Gentleman, V. Carey, et al. (editors) Bioinformatics and
Computational Biology Solutions Using R and Bioconductor, Springer,
2005.
[11] R. Gentleman, W. Huber. Working with Affymetrix data: estrogen, a
2x2 factorial design example. Practical Microarray Course, Heidelberg,
2003.
[12] R Development Core Team. R: A Language and Environment for
Statistical Computing. R Foundation for Statistical Computing, Vienna
Austria, 2008.
[13] R.C. Gentleman, V.J. Carey, D.M. Bates, B. Bolstad, M. Dettling, S.
Dudoit S, et al. Bioconductor: open software development for
computational biology and bioinformatics. Genome Biology, 5(10), R80,
2004.
[14] G.K. Smyth. Limma: Linear models for microarray data. Bioinformatics
and Computational Biology Solutions using R and Bioconductor. R.
Genleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (editors) Springer
pp. 397-420, 2005.
[15] L. Gautier, L. Cope, B.M. Bolstad, R.A. Irizarry. affy--analysis of
Affymetrix GeneChip data at the probe level. Bioinformatics, 12(3), pp.
307-315, 2004.
[16] A. Torrente, M. Kapushesky, A. Brazma. A new algorithm for
comparing and visualizing relationships between hierarchical and flat
gene expression data clusterings. Bioinformatics 21(21), pp. 3993-3999,
2005.
[17] D. Keim, H. Kriegel, M. Ankerst. Recursive pattern: a technique for
visualizing very large amounts of data. Proc. Visualization 1995, pp.
279-286, 1995.
[18] D.F. Andrews. Plots of high-dimensional data. Biometrics, 29, pp. 125-
136, 1972.
[19] J.M. Chambers, W.S. Cleveland, B. Kleiner, P.A. Tukey. Graphical
Methods for Data Analysis, Chapman and Hall, 1976.
[20] J. Bertin, Semiology of Graphics: Diagrams, Networks, Maps.
University of Wisconsin, Madison, WI, 1983.
[21] A. Inselberg, The plane with parallel coordinates. The Visual Computer,
pp. 69-92, 1985.
[22] A. Inselberg, B. Dimsdale, Parallel coordinates: A tool for visualizing
multidimensional geometry. Proc. IEEE Visualization, pp. 361-378,
1990.
[23] P. Hoffman, G. Grinstein. Dimensional anchors: a graphic primitive for
multidimensional multivariate information visualizations. Presented at
NPIV 99 (Workshop on New Paradigms in Information Visualization
and Manipulation), 1999.
[24] W. Peng, M.O. Ward, E.A. Rundensteiner, Clutter reduction in multidimensional
data visualization using dimension reordering. Proc. IEEE
Symposium on Information Visualization, pp. 89-96, 2004.
[25] M.O. Ward, XmdvTool: Integrating multiple methods for visualizing
multivariate data. Proc. IEEE Visualization 1994, pp. 326-333, 1994.
URL: http://davis.wpi.edu/~xmdv/.
[26] J. Yang, W. Peng, M.O. Ward, E.A. Rudensteiner, Interactive
hierarchical dimension ordering, spacing and filtering for exploration of
high dimensional datasets. Proc. IEEE Symposium on Information
Visualization, pp. 14-21, 2003.
[27] Y.-H. Fua, M.O. Ward, E.A. Rundensteiner, Hierarchical parallel
coordinates for exploration of large datasets. Proc. IEEE 5th
International Conference on Information Visualization, pp. 425-432,
2001.
[28] Y.-H. Fua, M.O. Ward, E.A. Rundensteiner, Navigating hierarchies with
structure-based brushes. Proc. IEEE 5th International Conference on
Information Visualization, pp. 58-64, 1999.
[29] J. Johansson, P. Ljung, M. Jern, M. Cooper, Revealing structure within
clustered parallel coordinates displays. Proc. IEEE Symposium on
Information Visualization, pp. 125-132, 2005.
[30] H. Siirtola, Direct manipulation of parallel coordinates, Proc. IEEE 4th
International Conference on Information Visualization, pp. 373-378,
2000.
[31] N. Lesh, M. Mitzenmacher, Interactive data summarization: an example
application. Proc. Working Conference on Advanced Visual Interfaces,
pp. 183-187, 2004.
[32] J.F. Rodrigues, Jr., A.J. Traina, C. Traina, Jr., Frequency plot and
relevance plot to enhance visual data exploration. Proc. XVI Brazilian
Symposium on Computer Graphics and Image Processing, pp. 117-134,
2003.
[33] M. Berthold, L.O. Hall, Visualizing fuzzy points in parallel coordinates.
IEEE Transactions on Fuzzy Systems, pp. 369-374, 2003.
[34] G. Andrienko, N. Andrienko, Parallel coordinates for exploring
properties of subsets. Proc. 2nd IEEE Conference on Coordinated and
Multiple Views in Exploratory Visualization, pp. 93-104, 2004.
[35] M. Novotny, Visually effective information visualization of large data.
Proc. 8th Central European Seminar on Computer Graphics, pp. 41-48,
2004.
[36] J.J. Miller, E.J. Wegman, Construction of line densities for parallel
coordinate plots. Computational Statistics and Graphics, eds. A. Buja, P.
Tukey, Springer-Verlag, pp. 107-123, 1990.
[37] E.J. Wegman, Hyperdimensional data analysis using parallel
coordinates. Journal of American Statistical Association, 85 (411), pp.
664-675, 1990.
[38] E.J. Wegman, Q. Luo, High dimensional clustering using parallel
coordinates and the grand tour. Proc. Conf. German Classification
Society, Freiburg, Germany, 1996.
[39] A.O. Artero, M.C. Ferreira de Oliveira, H. Levkowitz, Uncovering
Clusters in Crowded Parallel Coordinates Visualizations. Proc. IEEE
Symposium on Information Visualization, pp. 81-88, 2004.
[40] D. Ericson, J. Johansson, M. Cooper, Visual data analysis using tracked
statistical measures within parallel coordinate representations. Proc. 3rd
IEEE Conference on Coordinated and Multiple Views in Exploratory
Visualization, pp. 42-53, 2005.
[41] E. Bertini, L. Dell- Aquila, G. Santucci, Springview: cooperation of
radviz and parallel coordinates or view optimization and clutter
reduction. Proc. 3rd IEEE International Conference on Coordinated &
Multiple Views in Exploratory Visualization, pp. 22-29, 2005.
[42] P.C. Wong, R.D. Bergeron, Multivariate visualization using metric
scaling. Proc. IEEE Visualization 1997, pp. 111-118, 1997.
[43] Y.-H. Fua, M.O. Ward, E.A. Rundensteiner, Hierarchical parallel
coordinates for exploration of large datasets. Proc. IEEE 5th
International Conference on Information Visualization, pp. 425-432,
2001.
[44] M.O. Ward, XmdvTool: Integrating multiple methods for visualizing
multivariate data. Proc. IEEE Visualization 1994, pp. 326-333, 1994.
[45] J. Yang, A. Patro, S. Huang, N. Mehta, M.O. Ward, E.A. Rundensteiner,
Value and relation display for interactive exploration of high
dimensional datasets. Proc. IEEE Symposium on Information
Visualization 2004, pp. 73-80, 2004
[46] G. Leban, I. Bratko, U. Petrovic, T. Curk, B. Zupan. VizRank: finding
informative data projections in functional genomics by machine
learning. Bioinformatics, 21, 2005.
[47] P. Au, M. Carey, S. Sewraz, Y. Guo, S. Ruger. New paradigms in
information visualization. Proc. 23rd International ACM SIGIR
Conference, Athens, Greece, 2000.
[48] J. Seo, B. Shneiderman. A Rank-by-Feature framework for unsupervised
multidimensional data exploration using low dimensional projections.
Proc. IEEE InfoVis2004, pp. 65-72, 2004.
[49] URL: http://www.cs.umd.edu/hcil/hce/
[50] J. Demsar, B. Zupan, G. Leban. Orange: From Experimental Machine
Learning to Interactive Data Mining, White Paper. Faculty of Computer
and Information Science, University of Ljubljana.
[51] URL:www.ailab.si/orange
[52] M.A. Nour, G.R. Madey. Heuristic and optimization approaches to
extending the Kohonen self-organizing algorithm. European Journal of
Operational Research, 93(2), pp. 428-448, 1996.
[53] B. Fritzke. Growing cell structures - a self-organizing network for
unsupervised and supervised learning. Neural Networks 7, 9, pp. 1441-
1460, 1994.
[54] P. Koikkalainen, E. Oja. Self-organizing hierarchical feature maps,
International Joint Conference on Neural Networks IJCNN'90, pp. 279-
284, 1990.
[55] E. Oja. A simplified neuron model as a principle component analyzer.
Journal of Mathematical Biology ,15, pp. 267-273, 1982.
[56] M. A. Kraaijveld, J. Mao, A.K. Jain. A nonlinear projection method
based on Kohonen's topology preserving maps. IEEE Transactions on
Neural Networks, 6(3), pp. 548-559, 1995.
[57] D. Merkl, A. Rauber. Alternative ways for cluster visualization in selforganizing
maps, Proc. Workshop on Self-Organizing Maps, pp. 106-
111, 1997.
[58] M.-C. Su, H.-T. Chang. Fast self-organizing feature map algorithm,
IEEE Transaction on Neural Networks, 11(3), pp.721-727, 2000.