% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/stats.R
\name{modelGeneExpression}
\alias{modelGeneExpression}
\title{Gene expression modeling pipeline}
\usage{
modelGeneExpression(
  mae,
  yname = "Y",
  uname = "U",
  xnames,
  design = NULL,
  standardize = TRUE,
  parallel = FALSE,
  pvalues = TRUE,
  precalcmodels = NULL,
  ...
)
}
\arguments{
\item{mae}{MultiAssayExperiment object such as produced by
\code{\link{prepareCountsForRegression}}.}

\item{yname}{string indicating experiment in \code{mae} to use as the
expression input.}

\item{uname}{string indicating experiment in \code{mae} to use as the basal
expression level.}

\item{xnames}{character indicating experiments in \code{mae} to use as
molecular signatures.}

\item{design}{matrix giving the design matrix for the samples. Default
(\code{NULL}) is to use design found in \code{mae} metadata. Columns
corresponds to samples groups and rows to samples names. Only samples
included in the design will be processed.}

\item{standardize}{logical flag indicating if the molecular signatures should
be scaled. Advised to be set to \code{TRUE}.}

\item{parallel}{parallel argument to internally used
\code{\link[glmnet]{cv.glmnet}} function. Advised to be set to \code{FALSE}
as it might interfere with parallelization used in \code{modelGeneExpression}.}

\item{pvalues}{logical flag indicating if significance testing for the
estimated molecular signatures activities should be performed.}

\item{precalcmodels}{optional list of precomputed \code{'cv.glmnet'} objects
for each molecular signature and sample. The elements of this list should
be matching the \code{xnames} vector. Each of those elements should be a
named list holding \code{'cv.glmnet'} objects for each sample. If provided
those models will be used instead of running regression from scratch.}

\item{...}{arguments passed to glmnet::cv.glmnet.}
}
\value{
Nested list with following elements
\describe{
\item{regression_models}{Named list with elements corresponding to
signatures specified in \code{xnames}. Each of these is a list holding
\code{'cv.glmnet'} objects corresponding to each sample.}
\item{pvalues}{Named list with elements corresponding to
signatures specified in \code{xnames}. Each of these is a list holding
\code{data.frame} of signature's p-values and test statistics
estimated for each sample.}
\item{zscore_avg}{Named list with elements corresponding to
signatures specified in \code{xnames}. Each of these is a \code{matrix}
holding replicate average Z-scores with columns corresponding to groups
in the design.}
\item{coef_avg}{Named list with elements corresponding to
signatures specified in \code{xnames}. Each of these is a \code{matrix}
holding replicate averaged signatures activities with columns
corresponding to groups in the design.}
\item{results}{Named list of a \code{data.frame}s holding replicate
average molecular signatures, overall molecular signatures Z-score and
p-values calculated over groups using Stouffer's and Fisher's methods.}
}
}
\description{
\code{modelGeneExpression} uses parallelization if parallel backend is
registered. For that reason we advise against passing \code{parallel} argument
to internally called \code{\link[glmnet]{cv.glmnet}} routine.
}
\details{
For speeding up the calculations consider lowering number of folds used in
internally run \code{\link[glmnet]{cv.glmnet}} by specifying \code{nfolds}
argument. By default 10 fold cross validation is used.

The relationship between the expression (Y) and molecular signatures (X) is
described using linear model formulation. The pipeline attempts to model the
change in expression between basal expression level (u) and each sample, with
the goal of finding the unknown molecular signatures activities. Linear
models are fit using popular ridge regression implementation
\link[glmnet]{glmnet} (Friedman, Hastie, and Tibshirani 2010).

If \code{pvalues} is set to \code{TRUE} the significance of the estimated
molecular signatures activities is tested using methodology introduced by
(Cule, Vineis, and De Iorio 2011) which original implementation can be found
in \link[ridge]{ridge-package}.

If replicates are available the signatures activities estimates and
their standard error estimates can be combined. This is done by averaging
signatures activities estimates and pooling their significance estimates
using Stouffer's method for the Z-scores and Fisher's method for the p-values.

For detailed pipeline description we refer interested user to paper
accompanying this package.
}
\examples{
data("rinderpest_mini", "remap_mini")
base_lvl <- "00hr"
design <- matrix(
  data = c(1, 0, 0,
           1, 0, 0,
           1, 0, 0,
           0, 1, 0,
           0, 1, 0,
           0, 1, 0,
           0, 0, 1,
           0, 0, 1,
           0, 0, 1),
  ncol = 3,
  nrow = 9,
  byrow = TRUE,
  dimnames = list(colnames(rinderpest_mini), c("00hr", "12hr", "24hr")))
mae <- prepareCountsForRegression(
  counts = rinderpest_mini,
  design = design,
  base_lvl = base_lvl)
mae <- addSignatures(mae, remap = remap_mini)
mae <- filterSignatures(mae)
res <- modelGeneExpression(
  mae = mae,
  xnames = "remap",
  nfolds = 5)

}
