Clean Transcriptomics Database
DbNP Clean Transcriptomics Database
Overview
The Clean Transcriptomics Database is a module of the Nutritional Phenotype Database (dbNP). The development of this module is funded by NuGO and built by Wageningen University & Research Centre. It aims at a centralized storage of transcriptomics data, that is queryable via the dbNP query module, and has its accompanying study metadata stored in the dbNP study capture module. It complies with the 'dbNP omics submodule' standard by serving clean data via the dbNP clean data layer.
Planning
The module is built by Robert Kerkhoven and Philip de Groot from WUR. User specifications were provided by Guido Hooiveld from the same group (lead by Michael Müller). Test versions are deployed onto the WUR NBX. The ultimate goal is to deploy the database onto all NBXes, so that each NuGO member organization can store its own transcriptomics assay data on its NBX. The CTD has been released end 2010.
Goals
The clean transcriptomics database has two goals:
- The implementation of a uniform normalization for (initially) Affymetrix microarray data, which results in dbNP 'clean data'
- The implementation of the clean data layer of dbNP, to integrate the transcriptomics data with study capture and other omics data
The realization of these goals is described in the next two sections.
Providing clean data
Normalization is implemented by the GenePattern package 'NuGOMakeCleanData', which is installed on the GenePattern instances on the NuGO NBXes. It is the responsibility of the scientist to convert their raw data (CEL files) into normalized data (GCT file + CHIP file, the so called CustomCDF format). The package automatically downloads the latest annotations for the Affymetrix arrays. For documentation, see the following PDF file.
Integration with other dbNP modules
In order to serve the clean data, a web application is built that enables the user to upload their normalized transcriptomics expression data (GCT file and CHIP file). This data is stored in a local MySQL database. After that step, the web application enables the user to link the arrays in the data to assays and samples in the dbNP study capture module. Finally, the application implements the clean data layer of the dbNP query module to be able to serve the clean data for dbNP queries (as specified here).
Current development status
- The GenePattern module for normalization has been developed by Philip and has been installed on the WUR NBX. It will be deployed to the other NBXes soon.
- The web application for upload and storage of the clean transcriptomics data has been developed by Robert. It can be found at http://nbx13.nugo.org/ctd.
- As of February 2011 there are some changes in the way study and sample tokens are handled in the REST communication, those are being implemented by the GSCF team.





