% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/plot_structure.R
\name{plot_structure}
\alias{plot_structure}
\title{Visualize DMS and Model Scores on 3D Protein Structures}
\usage{
plot_structure(
  assay_name,
  pdb_file,
  data_scores = "DMS",
  dms_data = NULL,
  start_pos = NULL,
  end_pos = NULL,
  full_structure = FALSE,
  aggregate_fun = mean,
  color_scheme = NULL
)
}
\arguments{
\item{assay_name}{\code{character()} a valid DMS assay name. For the full list of
available assays, run \code{names()} on the list object loaded with
\code{ProteinGymR::dms_substitutions()}. Alternatively, the name of a}

\item{pdb_file}{\code{string()} defaults to corresonding PDB FilePath on
ExperimentHub. Alternatively, a file path to a user-defined PDB file.}

\item{data_scores}{\code{character()} specify whether DMS, zero-shot, or
supervised model prediction scores should be displayed scores. Pass either
"DMS" for experimental scores, or alternatively, a model name from
\code{available_models()} for zero-shot or \code{supervised_available_models()} for
semi-supervised models options. Defaults to DMS.}

\item{dms_data}{\code{list()} object of DMS assays loaded with
\code{ProteinGymR::dms_substitutions()}.
Alternatively, a user-defined list of DMS assays with names corresponding
to \code{assay_name} param.}

\item{start_pos}{\code{integer()} first amino acid position to plot. If missing,
default start is the first position along the protein in the PDB file.}

\item{end_pos}{\code{integer()} last amino acid position to plot. If missing,
default end is the last position along the protein in the PDB file.}

\item{full_structure}{\code{logical()} defaults to FALSE and will only plot
protein regions where there is DMS data available in the assay. If
\code{start_pos} and \code{end_pos} coordinates are specified, plotting is
restricted to this defined region. Setting \code{full_structure()} to TRUE
will display full protein structure in the PBD file, and grey out regions
where no DMS data is available.}

\item{aggregate_fun}{method for aggregating DMS scores for each residue.
For example, give \link{min}, \link{max}, or \link{var} to return the minimum, maximum,
or variance of scores for each position, respectively. \code{aggregate_fun} can
also take in a user-defined function with a numeric vector as input.
By default, the mean DMS score across mutations at each position is
calculated.}

\item{color_scheme}{\code{character()} defaults to blue, white, and red to
represent positive, neutral, negative scores. Set argument equal to "EVE"
to use the color scheme consistent with the popEVE portal.}
}
\value{
\code{plot_structure()} returns a \code{\link[r3dmol:init]{r3dmol::r3dmol}}
object of DMS scores for each position along a protein in a chosen DMS
assay. The x-axis shows amino acid positions where a DMS mutation exist,
and the y-axis represents possible amino acid residues, ordered by default
based on the physiochemical groupings. Higher and lower DMS scores
indicate a more positive or negative fitness effect after the mutation,
respectively.
}
\description{
\code{plot_structure()} plots DMS or model scores for amino acid
substitutions on a 3D protein structure for a chosen assay.
}
\details{
By default, \code{plot_structure()} plots the mean DMS values of all amino acid
residues, summarized for a protein position. If a model is chosen instead for
\code{data_scores} argument, a helper function is invoked which normalizes the
model prediction scores using a rank-based normal quantile transformation.
The result is a set of normalized scores that preserve the rank order of the
models scores, while standardizing the distribution. Transformed values
typically fall between -3 and 3. This normalization ensures the scores are
approximately standard normally distributed (mean = 0, SD = 1), allowing
comparisons across models.

For \code{plot_structure()},
\code{dms_data} must be a \code{list()} object with set names for each assay
element matching \code{assay_name} parameter.

Each assay in the \code{dms_data()} must include the following columns:
\itemize{
\item \code{mutant}: Mutant identifier string matching.
Specifically, the set of substitutions to apply on the reference sequence
to obtain the mutated sequence (e.g., A1P:D2N implies the amino acid 'A'
at position 1 should be replaced by 'P', and 'D' at position 2 should be
replaced by 'N').
\item \code{DMS_score}: Experimental measurement in the DMS assay.
Higher values indicate higher fitness of the mutated protein.
}

Each PBD table in \code{pdb_file} must include the following columns:
}
\examples{

plot_structure(assay_name = "C6KNH7_9INFA_Lee_2018",
   start_pos = 20, 
   end_pos = 50,
   full_structure = FALSE,
   aggregate_fun = max)
   
plot_structure(assay_name = "C6KNH7_9INFA_Lee_2018",
   start_pos = 20,
   end_pos = 50,
   data_scores = "GEMME")
   
plot_structure(assay_name = "ACE2_HUMAN_Chan_2020", 
    data_scores = "Kermut",
    color_scheme = "EVE")
  
}
