Welcome to the website for SHI and CHEN Supplementary material for

ˇ°Exploring LLE for Feature Dimension Reduction in Gene Expression Data Analysisˇ±

 submission to Bioinformatics

Shi Chao, Chen Lihui*

Division of Information Engineering, School of EEE, Nanyang Technological University,

Singapore, Republic of Singapore

 

Summary: In this paper, we reported the application of a relatively new, powerful but unexplored Locally Linear Embedding (LLE) algorithm for the feature deduction in gene expression data analysis. The algorithm is implemented in Matlab, and tested using Support Vector Machine (SVM) on 6 publicly available micro-array datasets. The results suggest LLE is a promising tool for unsupervised and fast feature extraction from microarray data.

 

Locally linear embedding was proposed by Roweis & Saul (2000) as an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high-dimensional inputs. Unlike most classical linear dimensionality reduction methods, LLE does NOT make any linear assumption about the distribution of data points. LLE assumes data points lie on curved and/or distorted manifolds in the high dimensional space, and data dimensionality reduction can be realized by proper unfolding of the manifolds. The actual process of unfolding carries out in a 'local-to-global' way, i.e., by carefully preserving the neighborhood geometry of each local patch of the manifolds, the global geometry of the data points can be recovered in lower dimensional space through the connection of  overlapping local patches. The algorithm was initial applied to face recognition problems by the authors, and has shown great simplicity and effectiveness.   In this paper, we would like to report the results of applying LLE to fast and unsupervised feature extraction in microarray data analysis. The algorithm has been implemented slightly different from the original one and tested on some microarray datasets.    

 

acknowledgements

Thanks to the original authors of LLE, Roweis and Saul, for publishing their source codes on the web; to Joachims for his software SVMlight; to Anton Schwaighofer for his work in interfacing SVMlight  with Matlab; to M. Chen for running simulations.

 

References

S.T.Roweis and L.K.Saul (2000), Nonlinear dimensionality reduction by locally linear embedding, Science, vol. 290, no. 5500, pp. 2323-2326.

T.Joachims. (2002) SVMLight Support Vector Machine. [Online]. Available at: http://svmlight.joachims.org/

A.Schwaighofer. Matlab interface to SVMLight. [Online]. Available at: http://www.cis.tugraz.at/igi/aschwaig/software.html

Y.Lu and J. Han (2003), Cancer Classification using Gene Expression Data.  Information Systems, pp. 243-268.

Sung Bae Cho, Hong-Hee Won (2003), Machine learning in DNA Microarray Analysis for Cancer Classification, Proceedings of the First Asia-Pacific bioinformatics conference on Bioinformatics, vol.19, pp. 189-198.

 

ADDIONAL TABLES, SUPPLEMENTAL INFORMATION AND SOFTWARE

ˇ¤        More testing results and additional information can be found here.

ˇ¤        The description and reference of the datasets we used can be found here.

ˇ¤        The Matlab scripts used could be downloaded here. (To unzip the code, use winzip or winrar.  To run, you need Matlab version 6 or higher.)