CytoMethIC-Oncology
is a collection machine learning
models for oncology. This includes CNS tumor classification, pan-cancer
classification, cell of origin classification, and subtype
classification models.
Models available are listed below:
EHID | ModelID | PredictionLabel |
---|---|---|
EH8423 | CancerCellOfOrigin21_rfc | Cell of origin defined in TCGA (N=21) |
NA | CancerType33_InfHum3_20230807 | TCGA cancer types (N=33) |
EH8398 | CancerType33_mlp | TCGA cancer types (N=33) |
EH8395 | CancerType33_rfc | TCGA cancer types (N=33) |
NA | CancerType33_rfcTCGA_InfHum3 | TCGA cancer types (N=33) |
EH8396 | CancerType33_svm | TCGA cancer types (N=33) |
EH8397 | CancerType33_xgb | TCGA cancer types (N=33) |
NA | CancerType33_xgbTCGA_InfHum3 | TCGA cancer types (N=33) |
EH8402 | CNSTumor66_mlp | CNS Tumor Class (N=66) |
EH8399 | CNSTumor66_rfc | CNS Tumor Class (N=66) |
NA | CNSTumor66_rfcCapper_InfHum3 | CNS Tumor Class (N=66) |
EH8400 | CNSTumor66_svm | CNS Tumor Class (N=66) |
EH8401 | CNSTumor66_xgb | CNS Tumor Class (N=66) |
NA | CNSTumor66_xgbCapper_InfHum3 | CNS Tumor Class (N=66) |
EH8422 | Subtype91_rfc | Cancer subtypes defined in TCGA (N=91) |
NA | TumorPurity_HM450 | Tumor purity (%) |
NA | TumorPurity_HM450_20240318 | Tumor purity (%) |
One can access the model using the EHID above in
ExperimentHub()[["EHID"]]
.
More models (if EHID is NA) are available in the following Github
Repo. You can directly download them and load with
readRDS()
. Some examples using either approach are
below.
The below snippet shows a demonstration of the model abstraction working on random forest and support vector models from CytoMethIC models on ExperimentHub.
## for missing data
library(sesame)
library(CytoMethIC)
betas = imputeBetas(sesameDataGet("HM450.1.TCGA.PAAD")$betas)
model = ExperimentHub()[["EH8395"]] # Random forest model
cmi_predict(betas, model)
## $response
## [1] "PAAD"
##
## $prob
## PAAD
## 0.852
## $response
## [1] "PAAD"
##
## $prob
## betas[, attr(model$terms, "term.labels")]
## 0.9864795
model = ExperimentHub()[["EH8422"]] # Cancer subtype
cmi_predict(sesameDataGet("HM450.1.TCGA.PAAD")$betas, model)
## $response
## [1] "GI.CIN"
##
## $prob
## GI.CIN
## 0.462
The below snippet shows a demonstration of the cmi_predict function working to predict the cell of origin of the cancer.
## $response
## [1] "C20:Mixed (Stromal/Immune)"
##
## $prob
## C20:Mixed (Stromal/Immune)
## 0.768
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] sesame_1.25.2 sesameData_1.25.0 CytoMethIC_1.3.3
## [4] ExperimentHub_2.15.0 AnnotationHub_3.15.0 BiocFileCache_2.15.0
## [7] dbplyr_2.5.0 BiocGenerics_0.53.3 generics_0.1.3
## [10] knitr_1.49
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.1.4
## [3] blob_1.2.4 filelock_1.0.3
## [5] Biostrings_2.75.3 fastmap_1.2.0
## [7] digest_0.6.37 mime_0.12
## [9] lifecycle_1.0.4 KEGGREST_1.47.0
## [11] RSQLite_2.3.9 magrittr_2.0.3
## [13] compiler_4.5.0 rlang_1.1.4
## [15] sass_0.4.9 tools_4.5.0
## [17] yaml_2.3.10 S4Arrays_1.7.1
## [19] bit_4.5.0.1 curl_6.0.1
## [21] DelayedArray_0.33.3 plyr_1.8.9
## [23] RColorBrewer_1.1-3 abind_1.4-8
## [25] BiocParallel_1.41.0 withr_3.0.2
## [27] purrr_1.0.2 grid_4.5.0
## [29] stats4_4.5.0 preprocessCore_1.69.0
## [31] wheatmap_0.2.0 e1071_1.7-16
## [33] colorspace_2.1-1 ggplot2_3.5.1
## [35] scales_1.3.0 SummarizedExperiment_1.37.0
## [37] cli_3.6.3 rmarkdown_2.29
## [39] crayon_1.5.3 reshape2_1.4.4
## [41] httr_1.4.7 tzdb_0.4.0
## [43] proxy_0.4-27 DBI_1.2.3
## [45] cachem_1.1.0 stringr_1.5.1
## [47] parallel_4.5.0 AnnotationDbi_1.69.0
## [49] BiocManager_1.30.25 XVector_0.47.1
## [51] matrixStats_1.4.1 vctrs_0.6.5
## [53] Matrix_1.7-1 jsonlite_1.8.9
## [55] IRanges_2.41.2 hms_1.1.3
## [57] S4Vectors_0.45.2 bit64_4.5.2
## [59] fontawesome_0.5.3 jquerylib_0.1.4
## [61] glue_1.8.0 codetools_0.2-20
## [63] stringi_1.8.4 gtable_0.3.6
## [65] BiocVersion_3.21.1 GenomeInfoDb_1.43.2
## [67] GenomicRanges_1.59.1 UCSC.utils_1.3.0
## [69] munsell_0.5.1 tibble_3.2.1
## [71] pillar_1.10.0 rappdirs_0.3.3
## [73] htmltools_0.5.8.1 randomForest_4.7-1.2
## [75] GenomeInfoDbData_1.2.13 R6_2.5.1
## [77] evaluate_1.0.1 Biobase_2.67.0
## [79] lattice_0.22-6 readr_2.1.5
## [81] png_0.1-8 memoise_2.0.1
## [83] BiocStyle_2.35.0 bslib_0.8.0
## [85] class_7.3-22 Rcpp_1.0.13-1
## [87] SparseArray_1.7.2 xfun_0.49
## [89] MatrixGenerics_1.19.0 pkgconfig_2.0.3