Package 'SHIP'

Title: Shrinkage Covariance Incorporating Prior Knowledge
Description: The SHIP-package allows the estimation of various types of shrinkage covariance matrices. These types differ in terms of the so-called covariance target (to be chosen by the user), the highly structured matrix which the standard unbiased sample covariance matrix is shrunken towards and which optionally incorporates prior knowledge. The shrinkage intensity is obtained via an analytical procedure.
Authors: Vincent Guillemot [aut, cre], Monika Jelizarow [aut]
Maintainer: Vincent Guillemot <[email protected]>
License: GPL (>= 2)
Version: 2.0.2
Built: 2026-06-07 08:52:55 UTC
Source: https://github.com/vguillemot/ship

Help Index


Creating a covariance target, optionally by using information from KEGG pathways.

Description

The function build.target() is a wrapper function to build the various types of covariance targets: diagonal ("D"), constant correlation ("F"), knowledge based ("G", "Gpos", and "Gstar"), correlation ("cor").

Usage

build.target(x, genegroups = NULL, type)

Arguments

x

An n×pn \times p matrix.

genegroups

List of the groups each gene belongs to: each entry of the list is dedicated to a gene (identified the same way as in xx). Each item of the list is thus a vector of pathway IDs.

type

Character string specifying the wished target: "D" for a diagonal target, "cor" for a correlation target, "G", "Gpos" and "Gstar" for a G-type target (see Jelizarow et al, 2010) and "F" for a F-target.

Value

A p×pp \times p target covariance matrix of a certain type.

Author(s)

Vincent Guillemot

References

M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.

See Also

targetCor, targetD, targetF, targetG, targetGpos, targetGstar,.

Examples

# Simulate dataset
x <- matrix(rnorm(20*30), 20, 30)
# Try different targets
build.target(x, type = "D")

Small example extracted from a microarray data set.

Description

The microarray data set is the study on the prostate cancer by Singh et al. The collection of the microarray is hgu95av2, and the gene groups are thus given by the information in the hgu95av2.db Bioconductor library (see Carslon et al.).

Usage

data("expl")

Format

The dataset is a list containing:

  • a 102×100102 \times 100 matrix xx of 100 genes randomly chosen from the data set of Singh et al.,

  • a list ‘⁠genegroups⁠’ containing 100 vectors of KEGG pathway IDs (which each gene belongs to).

Source

  • M. Carlson, S. Falcon, H. Pages, N. Li. hgu95av2.db: Affymetrix Human Genome U95 Set annotation data (chip hgu95av2). R package version 2.2.12.

  • D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, P. Tamayo, A. A. Renshaw, A. V. D'Amico, J. P. Richie, E. S. Lander, M. Loda, P. W. Kantoff, T. R. Golub, W. R. Sellers, 2002. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, Department of Adult Oncology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA., 1, 203-209.


Shrinkage estimator of the covariance matrix, given a data set and a covariance target.

Description

The shrinkage estimator is computed independently of the target's nature.

Usage

shrink.estim(x, tar)

Arguments

x

A n×pn \times p matrix (the data set) .

tar

A p×pp \times p matrix (the covariance target).

Value

A p×pp \times p shrinkage covariance matrix and the estimated λ\lambda.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

Examples

# Simulate dataset
x <- matrix(rnorm(20*30),20,30)
# Try different targets
shrink.estim(x, tar = build.target(x, type="D"))
shrink.estim(x, tar = build.target(x, type="D"))

Computation of the target Cor.

Description

The p×pp \times p target Cor is computed from the n×pn \times p data matrix. It it a modified version of target G. In particular, it tests the correlations (with a significance level of 0.05) and sets the non-significant correlations to zero before the mean correlation rˉ\bar{r} is computed.

Usage

targetCor(x, genegroups)

Arguments

x

A n×pn \times p data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A p×pp \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples

# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetCor(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !

Computation of the diagonal target D ('diagonal, unequal variances').

Description

The p×pp \times p diagonal target D is computed from the n×pn \times p data matrix. It is defined as follows (i,j=1,...,pi,j = 1,...,p):

tij={sii if i=j0 otherwise t_{ij}=\begin{cases}s_{ii} & \text{ if } i=j \\ 0 & \text{ otherwise }\end{cases}

where sijs_{ij} denotes the entry of the unbiased covariance matrix in row ii, column jj.

Usage

targetD(x, genegroups)

Arguments

x

A n×pn \times p data matrix.

genegroups

The genegroups are not used for this target.

Value

A p×pp \times p diagonal matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples

x <- matrix(rnorm(10*30),10,30)
tar <- targetD(x,NULL)

Computation of target F ('constant correlation model').

Description

The p×pp \times p target F is computed from the n×pn \times p data matrix. It is defined as follows (i,j=1,...,pi,j = 1,...,p):

tij={sii if i=jrˉsiisjj otherwise t_{ij} = \begin{cases} s_{ii} & \text{ if } i=j \\ \bar{r}\sqrt{s_{ii}s_{jj} \text{ otherwise }}& \end{cases}

where rˉ\bar{r} is the average of sample correlations and sijs_{ij} denotes the entry of the unbiased covariance matrix in row ii, column jj.

Usage

targetF(x, genegroups)

Arguments

x

A n×pn \times p data matrix.

genegroups

The genegroups are not used for this target.

Value

A p×pp \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples

# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetF(x,NULL)
which(tar[upper.tri(tar)]!=0) # many non zero coefficients !

Computation of target G ('knowledge-based constant correlation model').

Description

The p×pp \times p target G is computed from the n×pn \times p data matrix. It is defined as follows (i,j=1,...,pi,j = 1,...,p):

tij={sii if i=jrˉsiisjj if ij,ijt_{ij} = \begin{cases} s_{ii} & \text{ if } i=j\\ \bar{r}\sqrt{s_{ii}s_{jj}} & \text{ if } i\neq j, i\sim j \end{cases}

where rˉ\bar{r} is the average of sample correlations and sijs_{ij} denotes the entry of the unbiased covariance matrix in row ii, column jj. The notation iji\sim j means that genes ii and jj are connected, i.e. genes ii and jj are in the same gene functional group.

Usage

targetG(x, genegroups)

Arguments

x

A n×pn \times p data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A p×pp \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

  • J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

  • M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples

# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetG(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !

Computation of the target Gpos.

Description

The p×pp \times p target Gpos is computed from the n×pn \times p data matrix. It it a modified version of target G. In particular, it completely ignores negative correlations and computes the mean correlation rˉ\bar{r} using the positive ones only.

Usage

targetGpos(x, genegroups)

Arguments

x

A n×pn \times p data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A p×pp \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

  • J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

  • M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples

# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetGpos(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !

Computation of the target Gstar.

Description

The p×pp \times p target Gstar is computed from the n×pn \times p data matrix. It it a modified version of target G. In particular, it involves two parameters for the correlation (a positive and a negative one) instead of the single parameter rˉ\bar{r} in order to account for negatively correlated genes within the same pathway

Usage

targetGstar(x, genegroups)

Arguments

x

A n×pn \times p data matrix.

genegroups

A list of genes obtained using the database KEGG, where each entry itself is a list of pathway names this genes belongs to. If a gene does not belong to any gene functional group, the entry is NA.

Value

A p×pp \times p matrix.

Author(s)

Monika Jelizarow and Vincent Guillemot

References

  • J. Schaefer and K. Strimmer, 2005. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

  • M. Jelizarow, V. Guillemot, A. Tenenhaus, K. Strimmer, A.-L. Boulesteix, 2010. Over-optimism in bioinformatics: an illustration. Bioinformatics. Accepted.

See Also

targetCor, targetF, targetG, targetGstar, targetGpos.

Examples

# A short example on a toy dataset
# require(SHIP)
data(expl)
attach(expl)
tar <- targetGstar(x,genegroups)
which(tar[upper.tri(tar)]!=0) # not many non zero coefficients !