data_analysis [Wiki]

Return to start page

1. Short description

An approach for “gene set analysis”, i.e. for assessing whether a group of functionally related features (genes, RNAs, proteins or …) are regulated.
The GSRI estimates the fraction of significantly regulated features.
For estimation, the empirical cumulative density function (ecdf) of the p-values is analyzed. An iterative estimation procedure is used to unravel the difference to a uniform distribution of p-values (which corresponds to a diagonal line for the ecdf). It also enables calculation of standard errors for the fraction and significance statements.
In contrast to other similar approaches, no reference gene set which is NOT regulated (e.g. “all genes”) is required.
The most prominent similar approach is GSEA (gene set enrichment analysis)

The approach is applied several times in application project. It works.
Drawback: Collaborators weakly tend to more prominent approaches.

R-package “les” on Bioconductor

https://doi.org/10.1089/cmb.2008.0226

Weighting of the individual p-values leads to LES

1. Short description

Local estimate of the fraction of significant p-values
The approach has been developed for tiling arrays.
The approach can be applied if p-values from statistical tests are available in a spatial order (e.g. along the genome)
The GSRI estimates the fraction of significantly regulated features.
A smoothing window is applied in combination (similar to GSRI).
It enables significance statements whether at a certain position a significant fraction of p-values deviate from the uniform distribution.
The outcome can be used to rank genomic regions, i.e. for finding regions of interest.
In contrast to other similar approaches, no reference gene set which is NOT regulated (e.g. “all genes”) is required.
The most prominent similar approach is GSEA (gene set enrichment analysis)

The R-package was implemented by Julian Gehring. Is was one of the most experience R programmer in our group and later become group member in Wolfgang Huber’s lab (a major Bioconductor group). He utilized Bioconductor classes.
There were no other projects with similar data, i.e. where the approach could be applied.

R-package “les” on Bioconductor

Julian Gehring’s Masters Thesis

1. Short description

Transcription Start Site Identification (TSSi) based on sequencing reads
The data did not uniquely indicate TSSs.
The approach has been applied for prediction TSS for the physcomitrella patens genome. The results were available in the standard genome browser for this organism.

I guess that similar data is not produced any more. Therefore, the approach might be obsolete.

R-package TSSi

https://doi.org/10.1093/bioinformatics/bts189

1. Short description

The Mean Optimal Transformation Approach (MOTA) was suggested for investigating non-identifiablities.
Based on alternating conditional expectation (ACE) algorithm
Non-parametric method based on kernel estimation to unravel arbitrary dependencies in data
Works also for relations, e.g. a circle

Since based on kernel estimation restricted to low dimensional problems

R-package MOTA (not maintained any more, see CRAN archive)
ACE is available in as R-package “acepack”
Matlab code for ACE is available internally (ask Clemens)

ACE

Hengl S et al. Data-based identifiability analysis of nonlinear dynamical models (2007)

Breiman & Friedman. Estimating optimal transformations for multiple regression and correlation. (1985)

2. Short description

An explicit function which has very similar shape as ODE solutions of signalling pathways
If small amounts of data (observables) are available, the approach might serve as an alternative to traditional ODE modelling.
The approach provides self-explained parameters (amplitudes, response times, time-scales)
It can be directly fit to data in order to have an explicit function describing the time dependency (like a smoothing spline)
It can be fit to ODEs in order to have an approximation of the dynamics as explicit function (e.g. for multiscale models)

D2D is used for fitting
See: D2D Example folder (ToyModels/TransientFunction)

Fitting is very robust
For data, the outcome is great in 90% of cases
For approximating ODEs, the performance depends on the model. The accuracy is better than uncertainties of data.

Submitted

GSRI
LES
TSSi
Optimal Transformations
Error Models
Retarded Transient Function

</col>

1. GSRI

1. Short description

1.2. Applicability/restrictions/pitfalls

1.3. Code availability

1.4. Publications from the Timmer group

1.5. Side remark

2. LES

1. Short description

3. Applicability/restrictions/pitfalls

4. Code availability

5. Publications from the Timmer group

3. TSSi

1. Short description

2. Applicability/restrictions/pitfalls

3. Availability

4. Publication from the Timmer group

4. Optimal Transformations

1. Short description

2. Applicability/restrictions/pitfalls

3. Code availability

4. Other related methods

5. Publications from the Timmer group

6. Publication from external groups

5. Error Models

6. Retarded Transient Function

1. Under development

2. Short description

3. Availability

4. Applicability

5. Publication

Wiki