--- title: "Streamlining Molecular Intrinsic Subtyping with BreastSubtypeR" authors: "Qiao Yang & Emmanouil G. Sifakis" output: BiocStyle::html_document: toc: true number_sections: true vignette: > %\VignetteIndexEntry{BreastSubtypeR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # BreastSubtypeR ## Motivation Breast cancer (BC) is a highly heterogeneous disease characterized by distinct molecular intrinsic subtypes (IS) with unique clinical, biological, and prognostic profiles. These subtypes---such as Luminal A, Luminal B, HER2-Enriched, Basal-like, and Normal-like---are instrumental in guiding treatment strategies and prognostic evaluations. While clinical assays like **Prosigna®** provide standardized subtyping for patient care, the research community still lacks consensus due to fragmented methods and difficulties adapting them across diverse datasets. This inconsistency undermines the reproducibility and reliability of scientific findings. Current methods, such as the original **PAM50** (Parker et al., J Clin Oncol, 2009) and **AIMS** (Paquet et al., J Natl Cancer Inst, 2015), have significantly advanced BC subtyping but suffer from challenges such as limited adaptability to varying datasets. These limitations often lead to difficulties in reproducing results across independent studies, especially when datasets come from different platforms or research environments. Furthermore, there is no centralized, accessible framework that integrates multiple subtyping methods with a focus on consistency and reliability. Existing IS tools, like the `BiocStyle::Biocpkg("genefu")` package, are limited to a small subset of PAM50 variations and `BiocStyle::Biocpkg("AIMS")`, restricting their use to a narrow range of studies. High-performing methods, such as **subgroup-specific gene-centering (ssBC)**, perform well across various datasets but are not readily available as R packages; instead, they are distributed as standalone scripts. This makes it difficult for many researchers, particularly those without advanced computational skills, to implement these methods. Additionally, traditional IHC-based strategies, such as the conventional estrogen receptor (ER)-balancing via immunohistochemistry (**cIHC**), remain inaccessible to most, limiting adoption without specialized expertise. To address these challenges, **BreastSubtypeR** was developed as a comprehensive solution. This R package integrates multiple molecular subtyping methods into a single, cohesive framework. By doing so, it allows researchers to perform robust, reproducible subtyping analyses on BC datasets of various sizes and platforms. The inclusion of the **AUTO mode** enables the dynamic selection of the most appropriate method based on the dataset's characteristics, improving adaptability and accuracy. Furthermore, **BreastSubtypeR** incorporates optimized gene mapping techniques to overcome inconsistencies in gene sets, further enhancing reproducibility. Importantly, **BreastSubtypeR** is designed to be an accessible tool. The package includes an interactive **Shiny app** (**iBreastSubtypeR**), offering a user-friendly interface for both bioinformaticians and researchers with limited R programming experience. This makes subtyping analyses more accessible to researchers across diverse fields, from bioinformatics to clinical research, without requiring deep technical knowledge of the underlying methods. By bridging the gap between computational expertise and clinical application, **BreastSubtypeR** facilitates BC research and ultimately contributes to advancing our understanding of this complex disease. ## Features - **Comprehensive Intrinsic Subtyping for Breast Cancer**: Integrates multiple published intrinsic subtyping methods, including NC-based approaches like the original PAM50 (Parker et al., J Clin Oncol, 2009) and SSP-based methods like AIMS (Paquet et al., J Natl Cancer Inst, 2015). - **Multi-Method Subtyping Functionality**: Simultaneously predicts breast cancer intrinsic subtypes using a variety of validated methods for comparative analysis. - **AUTO Mode**: Automatically selects subtyping methods based on the ER/HER2 distribution of the test cohort, ensuring compatibility with the method-specific assumptions and improving accuracy. - **Optimized Gene Mapping**: Uses Entrez IDs for gene mapping to ensure the maximum inclusion of genes across subtyping methods. - **Streamlined Input/Output**: Standardized input/output formats to ensure smooth integration with other gene expression analysis tools. - **Shiny App Interface**: An intuitive web-based graphical user interface (GUI) for local, single-method subtyping analysis, ensuring privacy and data security. ## Implemented Approaches ### Single-Method Subtyping Approaches | **Approach** | **Description** | **Group** | **Citation** | |---------------|---------------------------|---------------|---------------| | `parker.original` | Original PAM50 by Parker et al., 2009 | NC-based | [Parker et al., 2009](https://doi.org/10.1200/JCO.2008.18.1370) | | `genefu.scale` | PAM50 implementation as in the genefu R package (scaled version) | NC-based | [Gendoo et al., 2016](https://doi.org/10.1093/bioinformatics/btv693) | | `genefu.robust` | PAM50 implementation as in the genefu R package (robust version) | NC-based | [Gendoo et al., 2016](https://doi.org/10.1093/bioinformatics/btv693) | | `cIHC` | Conventional estrogen receptor (ER)-balancing via immunohistochemistry (IHC) | NC-based | [Ciriello et al., 2015](https://doi.org/10.1016/j.cell.2015.09.033) | | `cIHC.itr` | Iterative version of cIHC | NC-based | [Curtis et al., 2012](https://doi.org/10.1038/nature10983) | | `PCAPAM50` | PCA-based iterative PAM50 (ER-balancing using ESR1 gene expression) | NC-based | [Raj-Kumar et al., 2019](https://doi.org/10.1038/s41598-019-44339-4) | | `ssBC` | Subgroup-specific gene-centering PAM50 | NC-based | [Zhao et al., 2015](https://doi.org/10.1186/s13058-015-0520-4) | | `ssBC.v2` | Updated subgroup-specific gene-centering PAM50 with refined quantiles | NC-based | [Fernandez-Martinez et al., 2020](https://doi.org/10.1200/JCO.20.01276) | | `AIMS` | Absolute Intrinsic Molecular Subtyping (AIMS) method | SSP-based | [Paquet & Hallett, 2015](https://doi.org/10.1093/jnci/dju357) | | `sspbc` | Single-Sample Predictors for Breast Cancer (AIMS adaptation) | SSP-based | [Staaf et al., 2022](https://doi.org/10.1038/s41523-022-00465-3) | ### Multi-Method Subtyping Functionality | **Approach** | **Description** | |--------------------|----------------------------------------------------| | **User-defined Multi-Method** | Allows users to select multiple subtyping methods for comparative analysis. | | **AUTO Mode Multi-Method** | Automatically selects subtyping methods based on the ER/HER2 distribution of the test cohort. | ## Installation To install **BreastSubtypeR** from Biocondunctor, run: ```{r, eval = FALSE} if (!require("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("BreastSubtypeR") ``` To install **BreastSubtypeR** from GitHub, run: ```{r, eval = FALSE} # Install devtools package if you haven't already install.packages("devtools") # Install BreastSubtypeR from GitHub devtools::install_github("yqkiuo/BreastSubtypeR") ``` ## Getting Started Here's an example of how to use **BreastSubtypeR** for multi-method breast cancer subtyping. The user manually selects the methods to be used: ```{r} library(BreastSubtypeR) # Load example data data("BreastSubtypeRobj") data("OSLO2EMIT0obj") # Perform gene mapping before subtyping data_input <- Mapping(OSLO2EMIT0obj$se_obj, method = "max", impute = TRUE, verbose = FALSE) # Perform multi-method subtyping methods <- c("parker.original", "PCAPAM50", "sspbc") result <- BS_Multi( data_input = data_input, methods = methods, Subtype = FALSE, hasClinical = FALSE ) # View the results head(result$res_subtypes[, 1:min(5, ncol(result$res_subtypes))], 5) # Visualize results plot <- Vis_Multi(result$res_subtypes) plot(plot) ``` Here's how to use **BreastSubtypeR** for multi-method subtyping with **AUTO** mode: ```{r} library(BreastSubtypeR) # Load example data data("BreastSubtypeRobj") data("OSLO2EMIT0obj") # Perform gene mapping before subtyping data_input <- Mapping(OSLO2EMIT0obj$se_obj, method = "max", impute = TRUE, verbose = FALSE) # Run subtyping with AUTO mode result <- BS_Multi( data_input = data_input, methods = "AUTO", Subtype = FALSE, hasClinical = FALSE ) # View the results head(result$res_subtypes[, 1:min(5, ncol(result$res_subtypes))], 5) # Visualize results plot <- Vis_Multi(result$res_subtypes) plot(plot) ``` For using **BreastSubtypeR** with the `parker.original` method: ```{r} library(BreastSubtypeR) # Load example data data("BreastSubtypeRobj") data("OSLO2EMIT0obj") # Perform subtyping with the `parker.original` method res <- BS_parker( se_obj = OSLO2EMIT0obj$data_input$se_NC, calibration = "Internal", internal = "medianCtr", Subtype = FALSE, hasClinical = FALSE ) ``` For using **BreastSubtypeR** with the `AIMS` method: ```{r} library(BreastSubtypeR) # Load example data data("BreastSubtypeRobj") data("OSLO2EMIT0obj") # Perform subtyping with the `AIMS` method res <- BS_AIMS(OSLO2EMIT0obj$data_input$se_SSP) ``` ### Usage #### Single-Method Subtyping | **Approach** | **Usage** | |-------------------|-----------------------------------------------------| | `parker.original` | `BS_parker(calibration = "Internal", internal = "medianCtr", ...)` | | `genefu.scale` | `BS_parker(calibration = "Internal", internal = "meanCtr", ...)` | | `genefu.robust` | `BS_parker(calibration = "Internal", internal = "qCtr", ...)` | | `cIHC` | `BS_cIHC(...)` | | `cIHC.itr` | `BS_cIHC.itr(...)` | | `PCAPAM50` | `BS_PCAPAM50(...)` | | `ssBC` | `BS_ssBC(s = "ER", ...)` | | `ssBC.v2` | `BS_ssBC(s = "ER.v2", ...)` | | `AIMS` | `BS_AIMS(...)` | | `sspbc` | `BS_sspbc(...)` | #### Multi-Method Subtyping | **Mode** | **Usage** | |-------------------|-----------------------------------------------------| | User-defined | `BS_Multi(methods = c("parker.original", "ssBC.v2", "sspbc", ...), ...)` | | AUTO Mode | `BS_Multi(methods = "AUTO", ...)` | ## Shiny App For users new to R, we offer an intuitive Shiny app for interactive molecular subtyping. ### Launch the Shiny App: To run iBreastSubtypeR locally with your data, first install and load the package as described above. Afterward, you can interactively access the Shiny app to visualize and analyze your dataset. Here's an example of how to launch it: ```{r, eval = FALSE} # Launch iBreastSubtypeR for interactive analysis library(BreastSubtypeR) library(tidyverse) library(shiny) library(bslib) iBreastSubtypeR() ``` The Shiny app allows you to:\ - Upload gene expression, clinical, and annotation data.\ - Perform subtyping using a preferred method.\ - Visualize the results in real-time.\ - Download results directly to your local machine. ## Contributing We welcome contributions to the package. If you find any bugs or have feature requests, feel free to open an issue [here](https://github.com/yqkiuo/BreastSubtypeR/issues). ## Citation If you use **BreastSubtypeR** in your work, please cite: - Yang, Q. [aut] & Sifakis, E. G. [cre], *BreastSubtypeR: A Unified R Package for Comprehensive Intrinsic Molecular Subtyping in Breast Cancer Research*. Available at: . - Additional relevant citations based on the methods you use (refer to the specific methods section for details). ## Session Information ```{r sessionInfo} library(BreastSubtypeR) sessionInfo() ``` ## References