CytoMethIC-Oncology is a collection machine learning models for oncology. This includes CNS tumor classification, pan-cancer classification, cell of origin classification, and subtype classification models.

MODELS

Models available are listed below:

CytoMethIC Oncology Models
EHID ModelID PredictionLabel
EH8423 CancerCellOfOrigin21_rfc Cell of origin defined in TCGA (N=21)
NA CancerType33_InfHum3_20230807 TCGA cancer types (N=33)
EH8398 CancerType33_mlp TCGA cancer types (N=33)
EH8395 CancerType33_rfc TCGA cancer types (N=33)
NA CancerType33_rfcTCGA_InfHum3 TCGA cancer types (N=33)
EH8396 CancerType33_svm TCGA cancer types (N=33)
EH8397 CancerType33_xgb TCGA cancer types (N=33)
NA CancerType33_xgbTCGA_InfHum3 TCGA cancer types (N=33)
EH8402 CNSTumor66_mlp CNS Tumor Class (N=66)
EH8399 CNSTumor66_rfc CNS Tumor Class (N=66)
NA CNSTumor66_rfcCapper_InfHum3 CNS Tumor Class (N=66)
EH8400 CNSTumor66_svm CNS Tumor Class (N=66)
EH8401 CNSTumor66_xgb CNS Tumor Class (N=66)
NA CNSTumor66_xgbCapper_InfHum3 CNS Tumor Class (N=66)
EH8422 Subtype91_rfc Cancer subtypes defined in TCGA (N=91)
NA TumorPurity_HM450 Tumor purity (%)
NA TumorPurity_HM450_20240318 Tumor purity (%)

One can access the model using the EHID above in ExperimentHub()[["EHID"]].

More models (if EHID is NA) are available in the following Github Repo. You can directly download them and load with readRDS(). Some examples using either approach are below.

CANCER TYPE

The below snippet shows a demonstration of the model abstraction working on random forest and support vector models from CytoMethIC models on ExperimentHub.

## for missing data
library(sesame)
library(CytoMethIC)
betas = imputeBetas(sesameDataGet("HM450.1.TCGA.PAAD")$betas)
model = ExperimentHub()[["EH8395"]] # Random forest model
cmi_predict(betas, model)
## $response
## [1] "PAAD"
## 
## $prob
##  PAAD 
## 0.852
model = ExperimentHub()[["EH8396"]] # SVM model
cmi_predict(betas, model)
## $response
## [1] "PAAD"
## 
## $prob
## betas[, attr(model$terms, "term.labels")] 
##                                 0.9864795
model = ExperimentHub()[["EH8422"]] # Cancer subtype
cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, model)
## $response
## [1] "GI.CIN"
## 
## $prob
## GI.CIN 
##  0.462

CELL-OF-ORIGIN

The below snippet shows a demonstration of the cmi_predict function working to predict the cell of origin of the cancer.

model = ExperimentHub()[["EH8423"]]
cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, model)
## $response
## [1] "C20:Mixed (Stromal/Immune)"
## 
## $prob
## C20:Mixed (Stromal/Immune) 
##                      0.768
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] sesame_1.25.2        sesameData_1.25.0    CytoMethIC_1.3.3    
##  [4] ExperimentHub_2.15.0 AnnotationHub_3.15.0 BiocFileCache_2.15.0
##  [7] dbplyr_2.5.0         BiocGenerics_0.53.3  generics_0.1.3      
## [10] knitr_1.49          
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.2.1            dplyr_1.1.4                
##  [3] blob_1.2.4                  filelock_1.0.3             
##  [5] Biostrings_2.75.3           fastmap_1.2.0              
##  [7] digest_0.6.37               mime_0.12                  
##  [9] lifecycle_1.0.4             KEGGREST_1.47.0            
## [11] RSQLite_2.3.9               magrittr_2.0.3             
## [13] compiler_4.5.0              rlang_1.1.4                
## [15] sass_0.4.9                  tools_4.5.0                
## [17] yaml_2.3.10                 S4Arrays_1.7.1             
## [19] bit_4.5.0.1                 curl_6.0.1                 
## [21] DelayedArray_0.33.3         plyr_1.8.9                 
## [23] RColorBrewer_1.1-3          abind_1.4-8                
## [25] BiocParallel_1.41.0         withr_3.0.2                
## [27] purrr_1.0.2                 grid_4.5.0                 
## [29] stats4_4.5.0                preprocessCore_1.69.0      
## [31] wheatmap_0.2.0              e1071_1.7-16               
## [33] colorspace_2.1-1            ggplot2_3.5.1              
## [35] scales_1.3.0                SummarizedExperiment_1.37.0
## [37] cli_3.6.3                   rmarkdown_2.29             
## [39] crayon_1.5.3                reshape2_1.4.4             
## [41] httr_1.4.7                  tzdb_0.4.0                 
## [43] proxy_0.4-27                DBI_1.2.3                  
## [45] cachem_1.1.0                stringr_1.5.1              
## [47] parallel_4.5.0              AnnotationDbi_1.69.0       
## [49] BiocManager_1.30.25         XVector_0.47.1             
## [51] matrixStats_1.4.1           vctrs_0.6.5                
## [53] Matrix_1.7-1                jsonlite_1.8.9             
## [55] IRanges_2.41.2              hms_1.1.3                  
## [57] S4Vectors_0.45.2            bit64_4.5.2                
## [59] fontawesome_0.5.3           jquerylib_0.1.4            
## [61] glue_1.8.0                  codetools_0.2-20           
## [63] stringi_1.8.4               gtable_0.3.6               
## [65] BiocVersion_3.21.1          GenomeInfoDb_1.43.2        
## [67] GenomicRanges_1.59.1        UCSC.utils_1.3.0           
## [69] munsell_0.5.1               tibble_3.2.1               
## [71] pillar_1.10.0               rappdirs_0.3.3             
## [73] htmltools_0.5.8.1           randomForest_4.7-1.2       
## [75] GenomeInfoDbData_1.2.13     R6_2.5.1                   
## [77] evaluate_1.0.1              Biobase_2.67.0             
## [79] lattice_0.22-6              readr_2.1.5                
## [81] png_0.1-8                   memoise_2.0.1              
## [83] BiocStyle_2.35.0            bslib_0.8.0                
## [85] class_7.3-22                Rcpp_1.0.13-1              
## [87] SparseArray_1.7.2           xfun_0.49                  
## [89] MatrixGenerics_1.19.0       pkgconfig_2.0.3