% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/generateRmdCodeDiffExpPhylo.R
\name{lengthNorm.limma.createRmd}
\alias{lengthNorm.limma.createRmd}
\title{Generate a \code{.Rmd} file containing code to perform differential expression analysis with length normalized counts + limma}
\usage{
lengthNorm.limma.createRmd(
  data.path,
  result.path,
  codefile,
  norm.method,
  extra.design.covariates = NULL,
  length.normalization = "RPKM",
  data.transformation = "log2",
  trend = FALSE,
  block.factor = NULL
)
}
\arguments{
\item{data.path}{The path to a .rds file containing the \code{phyloCompData} object that will be used for the differential expression analysis.}

\item{result.path}{The path to the file where the result object will be saved.}

\item{codefile}{The path to the file where the code will be written.}

\item{norm.method}{The between-sample normalization method used to compensate for varying library sizes and composition in the differential expression analysis. The normalization factors are calculated using the \code{calcNormFactors} of the \code{edgeR} package. Possible values are \code{"TMM"}, \code{"RLE"}, \code{"upperquartile"} and \code{"none"}}

\item{extra.design.covariates}{A vector containing the names of extra control variables to be passed to the design matrix of \code{limma}. All the covariates need to be a column of the \code{sample.annotations} data frame from the \code{\link{phyloCompData}} object, with a matching column name. The covariates can be a numeric vector, or a factor. Note that "condition" factor column is always included, and should not be added here. See Details.}

\item{length.normalization}{one of "none" (no length correction), "TPM", or "RPKM" (default). See details.}

\item{data.transformation}{one of "log2", "asin(sqrt)" or "sqrt". Data transformation to apply to the normalized data.}

\item{trend}{should an intensity-trend be allowed for the prior variance? Default to \code{FALSE}.}

\item{block.factor}{Name of the factor specifying a blocking variable, to be passed to \code{\link[limma]{duplicateCorrelation}} function of the \code{limma} package. All the factors need to be a \code{sample.annotations} from the \code{\link{phyloCompData}} object. Default to null (no block structure).}
}
\value{
The function generates a \code{.Rmd} file containing the code for performing the differential expression analysis. This file can be executed using e.g. the \code{knitr} package.
}
\description{
A function to generate code that can be run to perform differential expression analysis of RNAseq data (comparing two conditions) by applying a length normalizing transformation followed by differential expression analysis with limma. The code is written to a \code{.Rmd} file. This function is generally not called by the user, the main interface for performing differential expression analysis is the \code{\link{runDiffExp}} function.
}
\details{
For more information about the methods and the interpretation of the parameters, see the \code{limma} package and the corresponding publications.

The \code{length.matrix} field of the \code{phyloCompData} object
is used to normalize the counts, using one of the following formulas:
\itemize{
\item \code{length.normalization="none"} : \eqn{CPM_{gi} = \frac{N_{gi} + 0.5}{NF_i \times \sum_{g} N_{gi} + 1} \times 10^6}
\item \code{length.normalization="TPM"} : \eqn{TPM_{gi} = \frac{(N_{gi} + 0.5) / L_{gi}}{NF_i \times \sum_{g} N_{gi}/L_{gi} + 1} \times 10^6}
\item \code{length.normalization="RPKM"} : \eqn{RPKM_{gi} = \frac{(N_{gi} + 0.5) / L_{gi}}{NF_i \times \sum_{g} N_{gi} + 1} \times 10^9}
}

where \eqn{N_{gi}} is the count for gene g and sample i,
where \eqn{L_{gi}} is the length of gene g in sample i,
and \eqn{NF_i} is the normalization for sample i,
normalized using \code{calcNormFactors} of the \code{edgeR} package.

The function specified by the \code{data.transformation} is then applied
to the normalized count matrix.

The "\eqn{+0.5}" and "\eqn{+1}" are taken from Law et al 2014,
and dropped from the normalization
when the transformation is something else than \code{log2}.

The "\eqn{\times 10^6}" and "\eqn{\times 10^9}" factors are omitted when
the \code{asin(sqrt)} transformation is taken, as \eqn{asin} can only
be applied to real numbers smaller than 1.

The \code{design} model used in the \code{\link[limma]{lmFit}}
uses the "condition" column of the \code{sample.annotations} data frame from the \code{\link{phyloCompData}} object
as well as all the covariates named in \code{extra.design.covariates}.
For example, if \code{extra.design.covariates = c("var1", "var2")}, then
\code{sample.annotations} must have two columns named "var1" and "var2", and the design formula
in the \code{\link[limma]{lmFit}} function will be:
\code{~ condition + var1 + var2}.
}
\examples{
try(
if (require(limma)) {
tmpdir <- normalizePath(tempdir(), winslash = "/")
## Simulate data
mydata.obj <- generateSyntheticData(dataset = "mydata", n.vars = 1000, 
                                    samples.per.cond = 5, n.diffexp = 100, 
                                    id.species = factor(1:10),
                                    lengths.relmeans = rpois(1000, 1000),
                                    lengths.dispersions = rgamma(1000, 1, 1),
                                    output.file = file.path(tmpdir, "mydata.rds"))
## Add covariates
## Model fitted is count.matrix ~ condition + test_factor + test_reg
sample.annotations(mydata.obj)$test_factor <- factor(rep(1:2, each = 5))
sample.annotations(mydata.obj)$test_reg <- rnorm(10, 0, 1)
saveRDS(mydata.obj, file.path(tmpdir, "mydata.rds"))
## Diff Exp
runDiffExp(data.file = file.path(tmpdir, "mydata.rds"), result.extent = "length.limma", 
           Rmdfunction = "lengthNorm.limma.createRmd", 
           output.directory = tmpdir, norm.method = "TMM",
           extra.design.covariates = c("test_factor", "test_reg"))
})

}
\references{
Smyth GK (2005): Limma: linear models for microarray data. In: 'Bioinformatics and Computational Biology Solutions using R and Bioconductor'. R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds), Springer, New York, pages 397-420

Smyth, G. K., Michaud, J., and Scott, H. (2005). The use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21(9), 2067-2075.

Law, C.W., Chen, Y., Shi, W. et al. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15, R29.

Musser, JM, Wagner, GP. (2015): Character trees from transcriptome data: Origin and individuation of morphological characters and the so‐called “species signal”. J. Exp. Zool. (Mol. Dev. Evol.) 324B: 588– 604.
}
\author{
Charlotte Soneson, Paul Bastide, Mélina Gallopin
}
