| Title: | Shrinkage Covariance Incorporating Prior Knowledge |
|---|---|
| Description: | The SHIP-package allows the estimation of various types of shrinkage covariance matrices. These types differ in terms of the so-called covariance target (to be chosen by the user), the highly structured matrix which the standard unbiased sample covariance matrix is shrunken towards and which optionally incorporates prior knowledge. The shrinkage intensity is obtained via an analytical procedure. |
| Authors: | Vincent Guillemot [aut, cre], Monika Jelizarow [aut] |
| Maintainer: | Vincent Guillemot <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 2.0.2 |
| Built: | 2026-06-07 08:52:55 UTC |
| Source: | https://github.com/vguillemot/ship |
The function build.target() is a wrapper function to build the various types
of covariance targets: diagonal ("D"), constant correlation ("F"),
knowledge based ("G", "Gpos", and "Gstar"), correlation ("cor").
build.target(x, genegroups = NULL, type)build.target(x, genegroups = NULL, type)
x |
An |
genegroups |
List of the groups each gene belongs to: each entry of the
list is dedicated to a gene (identified the same way as in |
type |
Character string specifying the wished target: "D" for a diagonal target, "cor" for a correlation target, "G", "Gpos" and "Gstar" for a G-type target (see Jelizarow et al, 2010) and "F" for a F-target. |
A target covariance matrix of a certain
type.
Vincent Guillemot
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
targetCor, targetD,
targetF, targetG, targetGpos,
targetGstar,.
# Simulate dataset x <- matrix(rnorm(20*30), 20, 30) # Try different targets build.target(x, type = "D")# Simulate dataset x <- matrix(rnorm(20*30), 20, 30) # Try different targets build.target(x, type = "D")
The microarray data set is the study on the prostate cancer by Singh et al. The collection of the microarray is hgu95av2, and the gene groups are thus given by the information in the hgu95av2.db Bioconductor library (see Carslon et al.).
data("expl")data("expl")
The dataset is a list containing:
a matrix of 100 genes randomly chosen from the data set of
Singh et al.,
a list ‘genegroups’ containing 100 vectors of KEGG pathway IDs (which each gene belongs to).
M. Carlson, S. Falcon, H. Pages, N. Li. hgu95av2.db: Affymetrix Human Genome U95 Set annotation data (chip hgu95av2). R package version 2.2.12.
D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D'Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, W. R. Sellers, 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, Department of Adult Oncology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA., 1, 203-209.
The shrinkage estimator is computed independently of the target's nature.
shrink.estim(x, tar)shrink.estim(x, tar)
x |
A |
tar |
A |
A shrinkage covariance matrix and the
estimated .
Monika Jelizarow and Vincent Guillemot
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
# Simulate dataset x <- matrix(rnorm(20*30),20,30) # Try different targets shrink.estim(x, tar = build.target(x, type="D")) shrink.estim(x, tar = build.target(x, type="D"))# Simulate dataset x <- matrix(rnorm(20*30),20,30) # Try different targets shrink.estim(x, tar = build.target(x, type="D")) shrink.estim(x, tar = build.target(x, type="D"))
The target Cor is computed from the data matrix. It it a modified version of target G. In particular,
it tests the correlations (with a significance level of 0.05) and sets the
non-significant correlations to zero before the mean correlation
is computed.
targetCor(x, genegroups)targetCor(x, genegroups)
x |
A |
genegroups |
A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA. |
A matrix.
Monika Jelizarow and Vincent Guillemot
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
targetCor, targetF,
targetG, targetGstar, targetGpos.
# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetCor(x,genegroups) which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetCor(x,genegroups) which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !
The diagonal target D is computed from the data matrix. It is defined as follows ():
where
denotes the entry of the unbiased covariance matrix in row
, column .
targetD(x, genegroups)targetD(x, genegroups)
x |
A |
genegroups |
The genegroups are not used for this target. |
A diagonal matrix.
Monika Jelizarow and Vincent Guillemot
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
targetCor, targetF,
targetG, targetGstar, targetGpos.
x <- matrix(rnorm(10*30),10,30) tar <- targetD(x,NULL)x <- matrix(rnorm(10*30),10,30) tar <- targetD(x,NULL)
The target F is computed from the data matrix. It is defined as follows ():
where is the average of
sample correlations and denotes the entry of the unbiased
covariance matrix in row , column .
targetF(x, genegroups)targetF(x, genegroups)
x |
A |
genegroups |
The genegroups are not used for this target. |
A matrix.
Monika Jelizarow and Vincent Guillemot
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
targetCor, targetF,
targetG, targetGstar, targetGpos.
# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetF(x,NULL) which(tar[upper.tri(tar)]!=0) # many non zero coefficients !# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetF(x,NULL) which(tar[upper.tri(tar)]!=0) # many non zero coefficients !
The target G is computed from the data matrix. It is defined as follows ():
where
is the average of sample correlations and denotes the
entry of the unbiased covariance matrix in row , column
. The notation means that genes
and are connected, i.e. genes and are in
the same gene functional group.
targetG(x, genegroups)targetG(x, genegroups)
x |
A |
genegroups |
A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA. |
A matrix.
Monika Jelizarow and Vincent Guillemot
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
targetCor, targetF,
targetG, targetGstar, targetGpos.
# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetG(x,genegroups) which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetG(x,genegroups) which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !
The target Gpos is computed from the data matrix. It it a modified version of target G. In particular,
it completely ignores negative correlations and computes the mean
correlation using the positive ones only.
targetGpos(x, genegroups)targetGpos(x, genegroups)
x |
A |
genegroups |
A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA. |
A matrix.
Monika Jelizarow and Vincent Guillemot
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
targetCor, targetF,
targetG, targetGstar, targetGpos.
# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetGpos(x,genegroups) which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetGpos(x,genegroups) which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !
The target Gstar is computed from the data matrix. It it a modified version of target G. In particular,
it involves two parameters for the correlation (a positive and a negative
one) instead of the single parameter in order to account
for negatively correlated genes within the same pathway
targetGstar(x, genegroups)targetGstar(x, genegroups)
x |
A |
genegroups |
A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA. |
A matrix.
Monika Jelizarow and Vincent Guillemot
J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.
M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.
targetCor, targetF,
targetG, targetGstar, targetGpos.
# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetGstar(x,genegroups) which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !# A short example on a toy dataset # require(SHIP) data(expl) attach(expl) tar <- targetGstar(x,genegroups) which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !