Gene Expression Data Analysis


This research aims to classify gene expression data sets into different categories, such as normal vs. cancer. The main challenge is that thousands of genes are measured in the micro-array data, while only a small subset of genes are believed to be relevant for disease classification. We have developed a novel approach called "predictive automatic relevance determination;" this method brings Bayesian tools to bear on the problem of selecting which genes are relevant, and extends our earlier work on the development of the "expectation propagation" algorithm. In our simulations, the new method outperforms several state-of-the-art methods, including support-vector machines with feature selection and relevance-vector machines.