% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ea.R, R/io.R, R/visualize.R
\name{runDE}
\alias{runDE}
\alias{metaFC}
\alias{writeDE}
\alias{plotDEDistribution}
\alias{plotNrSamples}
\title{Differential expression analysis for datasets of a compendium}
\usage{
runDE(
  exp.list,
  de.method = c("limma", "edgeR", "DESeq2"),
  padj.method = "flexible",
  parallel = NULL,
  ...
)

metaFC(exp.list, max.na = round(length(exp.list)/3))

writeDE(exp.list, out.dir = NULL)

plotDEDistribution(exp.list, alpha = 0.05, beta = 1)

plotNrSamples(exp.list)
}
\arguments{
\item{exp.list}{Experiment list.  A \code{list} of datasets, each being of
class \code{\linkS4class{SummarizedExperiment}}.}

\item{de.method}{Differential expression method.  See documentation of
\code{\link{deAna}}.}

\item{padj.method}{Method for adjusting p-values to multiple testing.  For
available methods see the man page of the stats function \code{p.adjust}.
Defaults to 'flexible', which applies a dataset-specific correction
strategy. See details.}

\item{parallel}{Parallel computation mode.  An instance of class
\code{\linkS4class{BiocParallelParam}}.  See the vignette of the
\code{BiocParallel} package for switching between serial, multi-core, and
grid execution.  Defaults to \code{NULL}, which then uses the first element
of \code{BiocParallel::registered()} for execution.  If not changed by the
user, this accordingly defaults to multi-core execution on the local host.}

\item{...}{Additional arguments passed to \code{EnrichmentBrowser::deAna}.}

\item{max.na}{Integer. Determines for which genes a meta fold change is 
computed. Per default, excludes genes for which the fold change is not 
annotated in >= 1/3 of the datasets in \code{exp.list}.}

\item{out.dir}{Character.  Determines the output directory where DE results
for each dataset are written to.  Defaults to \code{NULL}, which then writes
to a subdir named 'de' in \code{tools::R_user_dir("GSEABenchmarkeR")}.}

\item{alpha}{Statistical significance level. Defaults to 0.05.}

\item{beta}{Absolute log2 fold change cut-off. Defaults to 1 (2-fold).}
}
\value{
\code{runDE} returns \code{exp.list} with DE measures annotated to
the \code{\link{rowData}} slot of each dataset, \code{writeDE} writes to file,
and \code{plotDEDistribution} plots to a graphics device.
}
\description{
This function applies selected methods for differential expression (DE)
analysis to selected datasets of an expression data compendium.
}
\details{
DE studies typically report a gene as differentially expressed if the
corresponding DE p-value, corrected for multiple testing, satisfies the
chosen significance level.  Enrichment methods that work directly on the
list of DE genes are then substantially influenced by the multiple testing
correction.

An example is the frequently used over-representation analysis (ORA), which
assesses the overlap between the DE genes and a gene set under study based
on the hypergeometric distribution (see Appendix A of the
\code{EnrichmentBrowser} vignette for an introduction).

ORA is inapplicable if there are few genes satisfying the significance
threshold, or if almost all genes are DE.

Using \code{padj.method="flexible"} accounts for these cases by applying
multiple testing correction in dependence on the degree of differential
expression:

\itemize{ \item the correction method from Benjamini and Hochberg (BH) is
applied if it renders >= 1\% and <= 25\% of all measured genes as DE, \item
the p-values are left unadjusted, if the BH correction results in < 1\% DE
genes, and \item the more stringent Bonferroni correction is applied, if the
BH correction results in > 25\% DE genes.  }

Note that resulting p-values should not be used for assessing the
statistical significance of DE genes within or between datasets.  They are
solely used to determine which genes are included in the analysis with ORA -
where the flexible correction ensures that the fraction of included genes is
roughly in the same order of magnitude across datasets.

Alternative stratgies could also be applied - such as taking a constant
number of genes for each dataset or excluding ORA methods in general from
the assessment.
}
\examples{

    # reading user-defined expression data from file
    data.dir <- system.file("extdata/myEData", package="GSEABenchmarkeR")
    edat <- loadEData(data.dir)

    # differential expression analysis
    edat <- runDE(edat)

    # visualization of per-dataset DE distribution
    plotDEDistribution(edat)

    # calculating meta fold changes across datasets 
    mfcs <- metaFC(edat, max.na=0) 

    # writing DE results to file
    out.dir <- tempdir()
    out.dir <- file.path(out.dir, "de")
    if(!file.exists(out.dir)) dir.create(out.dir)
 
    writeDE(edat, out.dir)    

}
\seealso{
\code{loadEData} to load a specified expression data compendium.
}
\author{
Ludwig Geistlinger <Ludwig.Geistlinger@sph.cuny.edu>
}
