% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/createCompDbPackage.R
\name{compound_tbl_sdf}
\alias{compound_tbl_sdf}
\title{Extract compound data from a file in SDF format}
\usage{
compound_tbl_sdf(file, collapse, onlyValid = TRUE, nonStop = TRUE)
}
\arguments{
\item{file}{\code{character(1)} with the name of the SDF file.}

\item{collapse}{optional \code{character(1)} to be used to collapse multiple
values in the columns \code{"synonyms"}. See examples for details.}

\item{onlyValid}{\code{logical(1)} to import only valid or all elements (defaults
to \code{onlyValid = TRUE})}

\item{nonStop}{\code{logical(1)} whether file content specific errors should
only reported as warnings and not break the full import process. The
value of this parameter is passed to parameter \code{skipErrors} of the
\code{\link[ChemmineR:read.SDFset]{ChemmineR::read.SDFset()}} function.}
}
\value{
A \link[tibble:tibble]{tibble::tibble} with general compound information (one row per
compound):
\itemize{
\item \code{compound_id}: the ID of the compound.
\item \code{name}: the compound's name.
\item \code{inchi}: the InChI of the compound.
\item \code{inchikey}: the InChI key.
\item \code{formula}: the chemical formula.
\item \code{exactmass}: the compound's (monoisotopic exact) mass.
\item \code{synonyms}: the compound's synonyms (aliases). This type of this column is
by default a \code{list} to support multiple aliases per compound, unless
argument \code{collapse} is provided, in which case multiple synonyms are pasted
into a single element separated by the value of \code{collapse}.
\item \code{smiles}: the compound's SMILES (if provided).
}
}
\description{
\code{compound_tbl_sdf()} extracts basic compound annotations from a file in SDF
format (structure-data file). The function currently supports SDF files from:
\itemize{
\item HMDB (Human Metabolome Database): http://www.hmdb.ca
\item ChEBI (Chemical Entities of Biological Interest): http://ebi.ac.uk/chebi
\item LMSD (LIPID MAPS Structure Database): http://www.lipidmaps.org
\item PubChem: https://pubchem.ncbi.nlm.nih.gov/
\item MoNa: http://mona.fiehnlab.ucdavis.edu/ (see notes below!)
}
}
\details{
Column \code{"name"} reports for HMDB files the \code{"GENERIC_NAME"}, for
ChEBI the \code{"ChEBI Name"}, for PubChem the \code{"PUBCHEM_IUPAC_TRADITIONAL_NAME"},
and for Lipid Maps the \code{"COMMON_NAME"}, if that is
not available, the first of the compounds synonyms and, if that is also not
provided, the \code{"SYSTEMATIC_NAME"}.
}
\note{
\code{compound_tbl_sdf()} supports also to read/process gzipped files.

MoNa SDF files organize the data by individual spectra (i.e. each element
is one spectrum) and individual compounds can not easily and consistently
defined (i.e. not all entries have an InChI ID or other means to uniquely
identify compounds). Thus, the function returns a highly redundant compound
table. Feedback on how to reduce this redundancy would be highly welcome!

LIPID MAPS was tested August 2020. Older SDF files might not work as the
field names were changed.
}
\examples{

## Read compound information from a subset of HMDB
fl <- system.file("sdf/HMDB_sub.sdf.gz", package = "CompoundDb")
cmps <- compound_tbl_sdf(fl)
cmps

## Column synonyms contains a list
cmps$synonyms

## If we provide the optional argument collapse, multiple entries will be
## collapsed.
cmps <- compound_tbl_sdf(fl, collapse = "|")
cmps
cmps$synonyms
}
\seealso{
\code{\link[=createCompDb]{createCompDb()}} for a function to create a SQLite-based compound
database.

Other compound table creation functions: 
\code{\link{compound_tbl_lipidblast}()}
}
\author{
Johannes Rainer and Jan Stanstrup
}
\concept{compound table creation functions}
