Modules overview
Also available in presentation mode…
Concept

A DbNP 'clean data module' is a module which contains measurement data for assays described in the study capture module (GSCF). The idea here is that a dedicated metabolomics module or application knows much better how to handle, import and pre-process metabolomics data than a big multi-omics application would. Also, you can easily plug in multiple different metabolomics modules, or even different versions of the same module. At this moment, there exist DbNP modules for transcriptomics, metabolomics and clinical chemistry data.
The module is supposed to deliver 'clean data' to DbNP query modules for statistical evaluation purposes. Ideally, this clean data should be neutralized for any measurement-specific variation, so that only biologically relevant variation remains in the data. Sometimes, this also has to be done with respect to axes defined in the study design. For example, the raw data of Affymetrix mRNA gene expression chips contain absolute probe intensity values. To render this kind of data in a biologically meaningful way, it is common practice to take - log ratios of two different chips, e.g. one that was performed on a sample taken before a treatment, and one of a sample taken after treatment. However, the relation of the sample to the treatment is defined in the study capture module. So when querying the transcriptomics module for data, the study capture module has to supply information about which samples should be compared with each other.
Requirements for a DbNP clean data module
A DbNP clean data module has to adhere to the following:
- It should store the measurement data, linked to assays (via the assay token) and samples (via sample tokens) in the associated GSCF instance
- It should have a web service backend with a stable URL, in order to serve out the required REST calls
- It should have a web interface which is able to display some information about a particular assay
The first requirement is the only hard link between the study design as defined in GSCF and the measurement data in the module. Obviously, this data needs to be linked via some kind of stable identifier. Those identifiers are defined in GSCF, because that is the hub module which links the many assay modules together. The word 'assay' can be a bit confusing in this sense, but it was chosen because it also used for this concept in standards such as ISA-TAB.
An assay is defined in GSCF, and always belongs to one study. The following information is known about an assay in GSCF:
- The identifier (assay token, which is unique over the whole GSCF instance) and descriptive name of the assay
- The samples on which the assay was performed
- The URL of the clean data module in which the actual measurement data for this assay is stored
The module is obviously free to store any platform-specific metadata about an assay (such as the DNA labeling protocol in transcriptomics). In the first requirements specification for DbNP it was mentioned that a module should also provide this metadata to GSCF via e.g. a REST interface in a uniform way, but this is not yet implemented because it didn't turn up in the actual use cases.
Communication overview
DbNP 'clean data modules' communicate with GSCF in several ways:
- GSCF provides RESTful services for the module, so the module can find out for which assays it is supposed to contain data
- The module provides RESTful services to GSCF (and possibly query/statistics modules) to output the 'clean data'
- The module serves a web page containing a summary of the measurements for each assay, and the URL of this page is made known to GSCF
The last way of communication is probably the least obvious, but it provides an easy way to integrate potentially very different web applications into each other on the user interface level. What in practice happens, is that in GSCF, the overview of the assays in a study contains for each assay a link to the assay information page in the respective module. If the user clicks that link, he or she is redirected to the URL of the assay information page in that module. The module is supposed to provide a back link to the study in GSCF (which is supplied as a callback link property in the URL).
TODO: maybe it's better to have GSCF make an iframe, just like google images?
RESTful services provided by GSCF
For the RESTful services provided by GSCF, only the GET HTTP method is implemented (because the underlying data is supposed to be changed in GSCF only), and the calls always return the information packaged as a JSON object. The following is an overview of the REST calls that GSCF provides. All of these calls also need an OAuth access token to identify which user asks for the information (see Authentication below). For more details on the arguments and return type of the REST calls, see the GSCF JavaDoc.
| URL | Description | |
| getStudies | Provide a list of all the studies contained in the GSCF instance to which this user has at least read access | |
| getStudyVersion | Provides the version number of a specific study (or studies) | |
| getAssays(studyToken, moduleURL) | Provide a list of all assays in a particular study which are supposed to be in the calling module | |
| getSamples(assayToken) |
Provide a list of all the samples in a particular assay (with some metadata attached, such as the sampling time) |
|
| getStudy(studyToken) | Provide detailed information about a certain study | |
| getAssay(assayToken) |
|
|
| getSample(assayToken, sampleToken) |
|
|
| getAuthorizationLevel(studyToken) | Return the authorization level of the supplied user with respect to the study (canRead, canWrite, isOwner) |
|
| isUser() | Returns true when the supplied user credentials are valid. Only there for Authentication Phase 1. | |
| getUser() | Returns the username and id of the user that is logged in into GSCF. |
RESTful services provided by a clean data module
The module also only has to implement the GET HTTP method for the following REST calls, and the return type should be JSON objects. A general convention is that when a parameter is a list (such as measurementTokens) and an empty list is given as argument, it means all available measurements.
Mandatory methods
| URL | Description |
| getMeasurements(assayToken) | Return a list of measurements that are available for a particular assay |
| getMeasurementMetaData(assayToken, measurementTokens) | Returns metadata for specific measurements, such as a metabolite ontology reference |
| getMeasurementData(assayToken, measurementTokens, sampleTokens) | Returns a table with the values for a number of specific measurements for specific samples in the specified assay |
| getAssayURL(assayToken) | Get the URL of the (module) web interface frontend page for the specified assay |
A detailed description of the mandatory methods with example REST calls is available here.
Methods needed to be able to query on assay properties
| URL | Description |
| getQueryableFields(entity) | Return a list of metadata fields that can be searched for a specific entity. It is also possible to search queryable fields with multiple entities at once. |
| getQueryableFieldData(entity, tokens, fields) | Returns the value of specific fields for the objects identified by tokens. |
| getPossibleActions(entity) | Returns a list of possible actions that can be performed on objects of a specific entity. This might be a method to export data or show details about an object. |
A detailed description of these methods with example REST calls is available here.
Methods needed to be able to query on assay properties
| URL | Description |
| notifyStudyChange(studyToken) | Notifies the module of a change in study properties of associations. |
|---|
Methods needed to be able to insert assays (e.g. needed in transcriptomics module to be able to import MAGETAB data)
Stilll investigating whether this is really needed.
URLs provided by a clean data module
| URL | Description | |
|---|---|---|
|
Jump to the view of the specified assay, or to the import function for that specific assay if it does not yet exist in the module |
Authentication
Authentication is in general taken care of by GSCF. The implementation of authentication is done in two phases. The end goal is to have GSCF act as an OAuth authentication and authorization provider, but since this is hard to set up, we currently work with a simpler model which only uses username and password.
Phase 1
In the current phase, username and password details are stored in GSCF. The calling module should supply 'credentials' by means of the browser session id in the HTTP URL of each call. The information that is given back is also based on these credentials, e.g. getStudies only returns the studies which the supplied user has access to. Because there is also a isUser call (only there for Phase 1), the module doesn't need to implement security on it's own, it doesn't have to maintain a user database - it can just redirect to GSCF if it sees that the isUser call fails, meaning that no user logged in to GSCF in this browser session yet. Below are some example scenario's which show how a module could implement authentication and authorization by relying entirely on GSCF, even in Phase 1.
Add a new assay within the module
Modules places a isUser call at [gscf]/rest/isUser, receives a negative answer.
Redirects to the GSCF login at [gscf]/login/auth_remote?consumer=<consumer>&token=<session_id>&returnUrl=<url to return when successfully logged in>
[user logs in at GSCF and is redirected to the returnUrl]
Module gets studies to which this user has access at [gscf]/rest/getStudies?consumer=<consumer>&token=<session_id>
User chooses a study
Module gets the assays it should hold for the chosen study: [gscf]/rest/getAssays?consumer=<consumer>&token=<session_id>
User picks an assay to add data to
Module checks authorization level for the study in question (should be OWNER or WRITER): [gscf]/rest/getAuthorizationLevel?consumer=<consumer>&token=<session_id>
If authorization level is sufficient, the user can add assay data
Retrieve data from an assay within the module
Modules places a isUser call at [gscf]/rest/isUser, receives a negative answer.
Redirects to the GSCF login at [gscf]/login/auth_remote?consumer=<consumer>&token=<session_id>&returnUrl=<url to return when successfully logged in>
[user logs in at GSCF and is redirected to the returnUrl]
Module gets studies to which this user has access at [gscf]/rest/getStudies?consumer=<consumer>&token=<session_id>
User chooses a study
Module gets the assays it should hold for the chosen study: [gscf]/rest/getAssays?consumer=<consumer>&token=<session_id>
User picks an assay to add data to
Module checks authorization level for the study in question (should be OWNER or WRITER): [gscf]/rest/getAuthorizationLevel?consumer=<consumer>&token=<session_id>
If authorization level is sufficient, the user can view the assay data
Show data for an assay from within GSCF
GSCF presents user/password login
User logs in
User browses to a certain study, displays its assays and clicks on the 'show assay in module' link
GSCF gets the URL location of the assay data: getAssayURL
GSCF jumps to the assay display in the module, in a IFrame, with a top bar which contains the GSCF navigation (like Google Images search)
From there on, proceed as in 'Retrieve data from an assay within the module' above. Because the user is already logged in to GSCF in that browser session, the module can immediately place a getAuthorizationLevel call to check whether the user really has access to the study that contains the assay data
Authentication of a REST call to a dbNP module
A user (which is in this case probably a machine, e.g. GSCF itself) poses a REST call to the module, e.g.[module]/rest/getMeasurements/query?sessionToken=<session token>&assayToken=<assay token>
The module checks with its parent GSCF (probably also the calling GSCF) if the token is valid: [gscf]/rest/isUser/query?consumer=<module URL>&token=<the provided session token>
If this token is indeed valid, the module checks whether the token implies the neccessary authorization level for the study in question (should be any of OWNER, READER or WRITER): [gscf]/rest/getAuthorizationLevel?consumer=<module URL>&token=<the provided session token>
If the authorization level is sufficient for all of the implied studies, the module returns the requested assay data
Phase 2
The end goal, Phase 2, is to have GSCF act as a full-blown OAuth authentication and authorization provider (like Facebook or Flickr). In that way, user accounts only have to be stored once (in GSCF), and all modules can profit from that. The main reason why we don't just stick with Phase 1, is because the mechanisms in Phase 1 are less secure and require additional hand-configured security (such as originating IP checks). OAuth is a secure implementation of the Single Sign On scenario's we want to implement. We would like to implement the new OAuth 2 protocol, however, since there seems to be hardly any Java code for that, we might in the end need to go for OAuth 1.
Conventions for the RESTful service
The RESTful services implemented in the GSCF and the Clean Data Module follow conventions with respect to the data objects passed. We give 5 of these conventions below.
(1) The rest calls are located in a URL that ends on "/rest" followed by the name of the REST call.
This is owed to the way that RESTful services are provided by Grails.
Example: demo.dbnp.org/gscf/rest/getStudies
(2) Return values are passed as JSON objects.
JSON is used as a container format to return values for REST calls.
(3) Parameters are passed in URL query strings.
Standard URL query strings with ampersand separators are used.
(4) Identifier strings are marked as "tokens"
IDs that refer to objects in the data model of the GSCF or data modules of the dbNP are named by strings ending on the substring 'Token'. E.g., "studyToken" or "assayToken".
(5) Lists arguments are passed by multiple mentionings of the field name in the URL query string.
If an argument passed is actually a list, then the lists' values are one by one tied to the field name each. E.g., if two assay tokens are passed, the URL query string the assayToken field occurs twice.
Example: .../rest/getAssays/query?assayToken=assay_1&assayToken=assay_2 .
The parameters passed in the URL query string obey a particular convention for list arguments.
(6) Invalid requests are replied to with HTTP 400, with the error message in the body of the response (without markup)
(7) If the user doesn't have access to a certain resource, a HTTP 401 error is given. If multiple resources are retrieved, and the user has access to one or more resources, those resources are returned without error.
(8) If a specified resource can't be found, a HTTP 404 error is given. If multiple resources are retrieved, and some resources exist and some don't, the existing resources are returned without an error
(9) If no user is logged in into GSCF, a HTTP 403 error is given.





