Sitemap Feedback Login  
 
 
 
 
 
 
 
 
   
  Research Details
 
Integrated Large Scale Data Analysis for Functional Discovery
 
With the generation of different large-scale data by various high-throughput technologies (MPSS to SAGE to microarray for global gene expression profiling, ChIP-chip to ChIP-seq for DNA-protein binding, and two-hybrid assay for protein-protein interaction), there is a need for integrating all these different kinds of data for biological discovery. We approach this problem from three aspects: 1) data analysis, 2) data integration, and 3) integration platform.

Data analysis and developmental of spike-in benchmark data
Due to insufficient validation methods, microarray data analysis methods for identifying differential expression have not been adequately validated. As a result, guidelines for selecting an appropriate method to use are also lacking. Most statistical methods are only suited for datasets of a larger sample size. However, gene expression profiling using microarray is typically inherent with two problems: a limitation in the number of replicates and a possible large variation between array replicates.

We have developed several spike-in microarray datasets with a large amount of both spike-ins and replicates. Using the benchmark data and statistical analysis of subsets of replicates, not only different data analysis methods can be evaluated (Figure 1) but also novel data analysis methods can be developed (Figure 2). Development of experimental benchmarking data is carried out in collaboration with Dr. Peter Morin at BTI.
 


Figure 1
 


Figure 2
 
Integrated approach for revealing novel growth factors
Identification of all growth factors relevant for proliferation and maintenance of embryonic stem cells (ESCs) would considerably improve ex vivo expansion of ESCs in conditional media and greatly facilitate many experimental studies. Traditionally, such growth factors have been discovered experimentally by error and trial. As numerous large-scale gene expression data for ESCs are available, the aim of this study was to derive a general framework for systematic identification of the growth factors relevant for proliferation and maintenance of ESCs by integrating various gene expression data as well as other data sets or databases.

We have developed a novel integrated approach to identify growth factors that may be used to augment and optimize propagation of human and mouse ESCs (Figure 3). By integrating transcriptome profiles of murine and human feeders, ESCs and embryoid bodies (EB) with protein-protein interaction data and homologene database and applying biological filters, we generated a detailed list of growth factors and complementary receptors for mouse and human ESCs, which are relevant for ESC proliferation and maintenance. Enhanced proliferation of human ESCs was demonstrated by supplementing a top ranked human growth factor (Pleotrophin), while knockdown of the corresponding receptor (PTPRZ1) led to elevated apoptosis and reduced colony formation of human ESCs (Figure 4). By integrating gene expression profiling of the PTPRZ1 knockdown, we further revealed possible regulatory mechanisms of PTPRZ1 in human ESC proliferation and apoptosis control. This work is carried out in collaboration with Dr. Bing Lim at Genome Institute Singapore.
 


Figure 3
 


Figure 4
 
Integrated platform to facilitate stem cell research
A variety of experimental technologies used to biologically characterize stem cells from different aspects have resulted in generation of vast amount of data. The Database for Integrated Stem Cell Research Online (DISO) is an integrated platform that consolidates heterogeneous data from different experiments, integrates information from various sources, and presents specific results upon query by users (Figure 5). DISO can be used to facilitate novel discovery such as possible c-Myc partners and targets for cell cycle control in embryonic stem cells (ESCs), alternatives of c-Myc in reprogramming, and ESC-enriched/relevant pathways and biological processes, which might be overlooked when data were studied separately. This initiative to aid effective mining and integration of diverse datasets and information such as gene expression, DNA-protein binding, protein-protein binding, and pathway information will benefit researchers by providing an integrated view of heterogenous data and consequently lead to the generation of more concrete hypotheses (Figure 6). This work is carried out in collaboration with the BII software engineering team.

 


Figure 5
 


Figure 6
 
 
Feedback Login Site Map