% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/buildGraphFromKEGGREST.R,
%   R/buildDataFromGraph.R, R/loadKEGGdata.R
\name{data-funs}
\alias{data-funs}
\alias{buildGraphFromKEGGREST}
\alias{buildDataFromGraph}
\alias{loadKEGGdata}
\title{Parse, build and load the KEGG knowledge model}
\usage{
buildGraphFromKEGGREST(organism = "hsa", filter.path = NULL)

buildDataFromGraph(keggdata.graph = NULL, databaseDir = NULL,
    internalDir = TRUE, matrices = c("hypergeom", "diffusion",
    "pagerank"), normality = c("diffusion", "pagerank"),
    dampingFactor = 0.85, niter = 100)

loadKEGGdata(databaseDir = tail(listInternalDatabases(), 1),
    internalDir = TRUE, loadMatrix = NULL)
}
\arguments{
\item{organism}{Character, KEGG code for the organism of interest}

\item{filter.path}{Character vector, pathways to filter. 
This is a pattern matched using regexp. 
E.g: \code{"01100"} to filter 
the overview metabolic pathway in any species}

\item{keggdata.graph}{An \pkg{igraph} 
object generated by the function 
\code{buildGraphFromKEGGREST}}

\item{databaseDir}{Character containing the directory to save KEGG files. 
It is a relative directory inside the library location
if \code{internalDir = TRUE}. If left to \code{NULL}, 
an automatic name containing the date, organism and 
the KEGG release is generated.}

\item{internalDir}{Logical, should the directory be internal 
in the package directory?}

\item{matrices}{A character vector, containing any of these: 
\code{"hypergeom"}, \code{"diffusion"}, \code{"pagerank"}}

\item{normality}{A character vector, containing any of these: 
\code{"diffusion"}, \code{"pagerank"}}

\item{dampingFactor}{Numeric value between 0 and 1 (none inclusive), 
damping factor \code{d} for 
PageRank (\code{\link[igraph:page_rank]{page.rank}})}

\item{niter}{Numeric value, number of iterations to estimate the p-values 
for the CC size. Between 10 and 1e3.}

\item{loadMatrix}{Character vector to choose if 
heavy matrices should be loaded. 
Can contain: \code{"diffusion"}, \code{"pagerank"}}
}
\value{
\code{buildGraphFromKEGGREST} returns the 
curated KEGG graph (class \pkg{igraph})

\code{buildDataFromGraph} returns 
\code{invisible(TRUE)} if successful. 
As a side effect, the
directory \code{outdir} is created, containing 
the internal data.

\code{loadKEGGdata} returns the 
\code{\link{FELLA.DATA}} object 
that contains the KEGG knowledge representation.
}
\description{
Function \code{buildGraphFromKEGGREST} makes use of the KEGG 
REST API (requires internet connection) 
to build and return the curated KEGG graph.

Function \code{buildDataFromGraph} takes as input the KEGG graph 
generated by \code{buildGraphFromKEGGREST} 
and writes the KEGG knowledge model in the desired permanent directory.

Function \code{loadKEGGdata} loads the internal files 
containing the KEGG knowledge model into a 
\code{\link{FELLA.DATA}} object. 

In general, \code{generateGraphFromKEGGREST} and 
\code{generateDataFromGraph} are one-time executions 
for a given organism and knowledge model, 
in this precise order. 
On the other hand, the user needs to run \code{loadKEGGdata} 
in every new R session to load such model into a 
\code{\link{FELLA.DATA}} object.
}
\details{
In function \code{buildGraphFromKEGGREST}, 
The user specifies (i) an organism, and (ii) patterns matching 
pathways that should not be included as nodes. 
A graph object, as described in [Picart-Armada, 2017], 
is built from the comprehensive 
KEGG database [Kanehisa, 2017]. 
As described in the main vignette, accessible through 
\code{browseVignettes("FELLA")}, this graph has five levels that 
represent categories of KEGG nodes. 
From top to bottom: pathways, modules, enzymes, reactions and compounds.
This knowledge representation is resemblant to the one formerly 
used by MetScape [Karnovsky, 2011], in which enzymes connect 
to genes instead of modules and pathways.
The necessary KEGG annotations 
are retrieved through KEGGREST R package [Tenenbaum, 2013]. 
Connections between pathways/modules and enzymes are inferred through 
organism-specific genes, i.e. an edge is added if a gene 
connects both entries. 
However, in order to enrich metabolomics data, the user has to 
pass the graph object to \code{buildDataFromGraph}  
to obtain the \code{\link{FELLA.USER}} object. 
All the networks are handled with the igraph R package [Csardi, 2006].

Using \code{buildDataFromGraph} is the second step 
to use the \code{\link[=FELLA]{FELLA}} package. 
The knoledge graph is used to compute other internal variables that are 
required to run any enrichment. 
The main point behind the enrichment is to provide a small 
part of the knowledge graph relevant to the supplied metabolites. 
This is accomplished through diffusion processes and random walks, 
followed by a statistical normalisation, 
as described in [Picart-Armada, 2017].
When building the internal files, 
the user can choose whether to store (i) matrices for each 
provided method, and (ii) vectors derived from such matrices 
to use the parametric approaches. 
These are optional but enable (i) faster permutations and custom 
metabolite backgrounds, and (ii) parametric approaches. 
WARNING: diffusion and PageRank matrices in (i) 
can allocate up to 250MB each. 
On the other hand, the \code{niter} parameter 
controls the amount of trials to approximate the 
distribution of the connected component size under 
uniform node sampling. 
For further info, see the option \code{thresholdConnectedComponent} 
in the details from \code{?generateResultsGraph}. 
Regarding the destination, the user can specify 
the name of the directory. 
Otherwise a name containing the creation date, the organism 
and the KEGG release will be used. 
The database can be stored within the library path or in a 
custom location.

Function \code{loadKEGGdata} returns a 
\code{\link{FELLA.DATA}} object from any of the 
databases generated by \code{\link{FELLA.DATA}}.
This object is the starting point of any enrichment 
using \code{\link{FELLA}}.
In case the user built the matrices for "diffusion" and "pagerank", 
he or she can choose to load them.  
Further detail on the methods can be found in [Picart-Armada, 2017].
The matrices allow a faster computation and the definition 
of a custom background, but use up to 250MB of memory each.
}
\examples{
## Toy example
## In this case, the graph is not built from current KEGG. 
## It is loaded from sample data in FELLA
data("FELLA.sample")
## Graph to build the database (this example is a bit hacky)
g.sample <- FELLA:::getGraph(FELLA.sample)
dir.tmp <- paste0(tempdir(), "/", paste(sample(letters), collapse = ""))
## Build internal files in a temporary directory
buildDataFromGraph(
keggdata.graph = g.sample, 
databaseDir = dir.tmp, 
internalDir = FALSE, 
matrices = NULL, 
normality = NULL, 
dampingFactor = 0.85,
niter = 10)
## Load database
myFELLA.DATA <- loadKEGGdata(
dir.tmp, 
internalDir = FALSE)
myFELLA.DATA

######################

\dontrun{
## Full example

## First step: graph for Mus musculus discarding the mmu01100 pathway
## (an analog example can be built from human using organism = "hsa")
g.mmu <- buildGraphFromKEGGREST(
organism = "mmu", 
filter.path = "mmu01100")
summary(g.mmu)
cat(comment(g.mmu))

## Second step: build internal files for this graph
## (consumes some time and memory, especially if we compute 
"diffusion" and "pagerank" matrices)
buildDataFromGraph(
keggdata.graph = g.mmu, 
databaseDir = "example_db_mmu", 
internalDir = TRUE, 
matrices = c("hypergeom", "diffusion", "pagerank"), 
normality = c("diffusion", "pagerank"), 
dampingFactor = 0.85,
niter = 1e3)
## Third step: load the internal files into a FELLA.DATA object
FELLA.DATA.mmu <- loadKEGGdata(
"example_db_mmu", 
internalDir = TRUE, 
loadMatrix = c("diffusion", "pagerank"))
FELLA.DATA.mmu
}

}
\references{
Kanehisa, M., Furumichi, M., Tanabe, 
M., Sato, Y., & Morishima, K. (2017). 
KEGG: new perspectives on genomes, pathways, diseases and drugs. 
Nucleic acids research, 45(D1), D353-D361.

Karnovsky, A., Weymouth, T., Hull, T., Tarcea, V. G., 
Scardoni, G., Laudanna, C., ... & Athey, B. (2011). 
Metscape 2 bioinformatics tool for the analysis 
and visualization of metabolomics and gene expression data. 
Bioinformatics, 28(3), 373-380.

Tenenbaum, D. (2013). KEGGREST: Client-side REST access 
to KEGG. R package version, 1(1).

Chang, W., Cheng, J., Allaire, JJ., 
Xie, Y., & McPherson, J. (2017).
shiny: Web Application Framework for R. R package version 1.0.5.
https://CRAN.R-project.org/package=shiny

Picart-Armada, S., Fernandez-Albert, F., Vinaixa, 
M., Rodriguez, M. A., Aivio, S., Stracker, 
T. H., Yanes, O., & Perera-Lluna, A. (2017). 
Null diffusion-based enrichment for metabolomics data. 
PLOS ONE, 12(12), e0189012.
}
\seealso{
class \code{\link{FELLA.DATA}}
}
