% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/id_mapping.R
\name{translate_ids}
\alias{translate_ids}
\title{Translate gene, protein and small molecule identifiers}
\usage{
translate_ids(
  d,
  ...,
  uploadlists = FALSE,
  ensembl = FALSE,
  hmdb = FALSE,
  ramp = FALSE,
  chalmers = FALSE,
  entity_type = NULL,
  keep_untranslated = TRUE,
  return_df = FALSE,
  organism = 9606,
  reviewed = TRUE,
  complexes = NULL,
  complexes_one_to_many = NULL,
  track = FALSE,
  quantify_ambiguity = FALSE,
  qualify_ambiguity = FALSE,
  ambiguity_groups = NULL,
  ambiguity_global = FALSE,
  ambiguity_summary = FALSE,
  expand = TRUE
)
}
\arguments{
\item{d}{Character vector or data frame.}

\item{...}{At least two arguments, with or without names. The first
of these arguments describes the source identifier, the rest
of them describe the target identifier(s). The values of all these
arguments must be valid identifier types as shown in Details. The
names of the arguments are column names. In case of the first
(source) ID the column must exist. For the rest of the IDs new
columns will be created with the desired names. For ID types provided
as arguments without names, the name of the ID type will be used for
column name.}

\item{uploadlists}{Force using the \code{uploadlists} service from UniProt.
By default the plain query interface is used (implemented in
\code{\link{uniprot_full_id_mapping_table}} in this package).
If any of the provided ID types is only available in the uploadlists
service, it will be automatically selected. The plain query interface
is preferred because in the long term, with caching, it requires
less download and data storage.}

\item{ensembl}{Logical: use data from Ensembl BioMart instead of UniProt.}

\item{hmdb}{Logical: use HMDB ID translation data.}

\item{ramp}{Logical: use RaMP ID translation data.}

\item{chalmers}{Logical: use ID translation data from Chalmers Sysbio GEM.}

\item{entity_type}{Character: "gene" and "smol" are short symbols for
proteins, genes and small molecules respectively. Several other synonyms
are also accepted.}

\item{keep_untranslated}{In case the output is a data frame, keep the
records where the source identifier could not be translated. At
these records the target identifier will be NA.}

\item{return_df}{Return a data frame even if the input is a vector.}

\item{organism}{Character or integer, name or NCBI Taxonomy ID of the
organism (by default 9606 for human). Matters only if
\code{uploadlists} is \code{FALSE}.}

\item{reviewed}{Translate only reviewed (\code{TRUE}), only unreviewed
(\code{FALSE}) or both (\code{NULL}) UniProt records. Matters only
if \code{uploadlists} is \code{FALSE}.}

\item{complexes}{Logical: translate complexes by their members. Only
complexes where all members can be translated will be included in the
result. If \code{NULL}, the option \code{omnipathr.complex_translation} will
be used.}

\item{complexes_one_to_many}{Logical: allow combinatorial expansion or
use only the first target identifier for each member of each complex.
If \code{NULL}, the option \code{omnipathr.complex_translation_one_to_many}
will be used.}

\item{track}{Logical: Track the records (rows) in the input data frame by
adding a column \code{record_id} with the original row numbers.}

\item{quantify_ambiguity}{Logical or character: inspect the mappings for each
ID for ambiguity. If TRUE, for each translated column, two new columns
will be created with numeric values, representing the ambiguity of the
mapping on the "from" and "to" side of the translation, respectively.
If a character value provided, it will be used as a column name suffix
for the new columns.}

\item{qualify_ambiguity}{Logical or character: inspect the mappings for each
ID for ambiguity. If TRUE, for each translated column, a new column
will be inculded with values \code{one-to-one}, \code{one-to-many}, \code{many-to-one}
or \code{many-to-many}. If a character value provided, it will be used as a
column name suffix for the new column.}

\item{ambiguity_groups}{Character vector: additional column names to group by
during inspecting ambiguity. By default, the identifier columns (from
and to) will be used to determine the ambiguity of mappings.}

\item{ambiguity_global}{Logical or character: if \code{ambiguity_groups} are
provided, analyse ambiguity also globally, across the whole data frame.
Character value provides a custom suffix for the columns quantifying
and qualifying global ambiguity.}

\item{ambiguity_summary}{Logical: generate a summary about the ambiguity of the
translation and make it available as an attribute.
columns will be lists of character vectors.}

\item{expand}{Logical: if \code{TRUE}, ambiguous (to-many) mappings will be
expanded to multiple rows, resulting character type columns; if
\code{FALSE}, the original rows will be kept intact, and the target}
}
\value{
\itemize{
\item{Data frame: if the input is a data frame or the input is a
vector and \code{return_df} is \code{TRUE}.}
\item{Vector: if the input is a vector, there is only one target
ID type and \code{return_df} is \code{FALSE}.}
\item{List of vectors: if the input is a vector, there are more than
one target ID types and \code{return_df} is \code{FALSE}. The names
of the list will be ID types (as they were column names, see
the description of the \code{...} argument), and the list will also
include the source IDs.}
}
}
\description{
Translates a vector of identifiers, resulting a new vector, or a column
of identifiers in a data frame by creating another column with the target
identifiers.
}
\details{
This function, depending on the \code{uploadlists} parameter, uses either
the uploadlists service of UniProt or plain UniProt queries to obtain
identifier translation tables. The possible values for \code{from} and \code{to}
are the identifier type abbreviations used in the UniProt API, please
refer to the table here: \url{https://www.uniprot.org/help/api_idmapping}.
In addition, simple synonyms are available which realize a uniform API
for the uploadlists and UniProt query based backends. These are the
followings:\tabular{llll}{
   \strong{OmnipathR} \tab \strong{Uploadlists} \tab \strong{UniProt query} \tab \strong{Ensembl BioMart} \cr
   uniprot \tab ACC \tab id \tab uniprotswissprot \cr
   uniprot_entry \tab ID \tab entry name \tab  \cr
   trembl \tab \emph{reviewed = FALSE} \tab \emph{reviewed = FALSE} \tab uniprotsptrembl \cr
   genesymbol \tab GENENAME \tab genes(PREFERRED) \tab external_gene_name \cr
   genesymbol_syn \tab  \tab genes(ALTERNATIVE) \tab external_synonym \cr
   hgnc \tab HGNC_ID \tab database(HGNC) \tab hgnc_symbol \cr
   entrez \tab P_ENTREZGENEID \tab database(GeneID) \tab  \cr
   ensembl \tab ENSEMBL_ID \tab  \tab ensembl_gene_id \cr
   ensg \tab ENSEMBL_ID \tab  \tab ensembl_gene_id \cr
   enst \tab ENSEMBL_TRS_ID \tab database(Ensembl) \tab ensembl_transcript_id \cr
   ensp \tab ENSEMBL_PRO_ID \tab  \tab ensembl_peptide_id \cr
   ensgg \tab ENSEMBLGENOME_ID \tab  \tab  \cr
   ensgt \tab ENSEMBLGENOME_TRS_ID \tab  \tab  \cr
   ensgp \tab ENSEMBLGENOME_PRO_ID \tab  \tab  \cr
   protein_name \tab  \tab protein names \tab  \cr
   pir \tab PIR \tab database(PIR) \tab  \cr
   ccds \tab  \tab database(CCDS) \tab  \cr
   refseqp \tab P_REFSEQ_AC \tab database(refseq) \tab  \cr
   ipro \tab  \tab  \tab interpro \cr
   ipro_desc \tab  \tab  \tab interpro_description \cr
   ipro_sdesc \tab  \tab  \tab interpro_short_description \cr
   wikigene \tab  \tab  \tab wikigene_name \cr
   rnacentral \tab  \tab  \tab rnacentral \cr
   gene_desc \tab  \tab  \tab description \cr
   wormbase \tab  \tab database(WormBase) \tab  \cr
   flybase \tab  \tab database(FlyBase) \tab  \cr
   xenbase \tab  \tab database(Xenbase) \tab  \cr
   zfin \tab  \tab database(ZFIN) \tab  \cr
   pbd \tab PBD_ID \tab database(PDB) \tab pbd \cr
}


For a complete list of ID types and their synonyms, including metabolite and
chemical ID types which are not shown here, see \code{\link{id_types}}.

The mapping between identifiers can be ambiguous. In this case one row
in the original data frame yields multiple rows or elements in the
returned data frame or vector(s).

The columns in the translation must be character type. Some ID types are
numeric, such as the ones from NCBI, these are sometimes present in data
frames as double or integer type. This function will convert those columns
to character.
}
\examples{
d <- data.frame(
    uniprot_id = c(
        'P00533', 'Q9ULV1', 'P43897', 'Q9Y2P5',
        'P01258', 'P06881', 'P42771', 'Q8N726'
    )
)
d <- translate_ids(d, uniprot_id = uniprot, genesymbol)
d
#   uniprot_id genesymbol
# 1     P00533       EGFR
# 2     Q9ULV1       FZD4
# 3     P43897       TSFM
# 4     Q9Y2P5    SLC27A5

}
\seealso{
\itemize{
\item{\code{\link{translate_ids_multi}}}
\item{\code{\link{uniprot_id_mapping_table}}}
\item{\code{\link{uniprot_full_id_mapping_table}}}
\item{\code{\link{ensembl_id_mapping_table}}}
\item{\code{\link{hmdb_id_mapping_table}}}
\item{\code{\link{id_types}}}
\item{\code{\link{ensembl_id_type}}}
\item{\code{\link{uniprot_id_type}}}
\item{\code{\link{uploadlists_id_type}}}
\item{\code{\link{hmdb_id_type}}}
\item{\code{\link{chalmers_gem_id_type}}}
}
}
