MultimodalExperiment 1.2.0
To install MultimodalExperiment from Bioconductor, use BiocManager as follows.
BiocManager::install("MultimodalExperiment")
To install MultimodalExperiment from GitHub, use BiocManager as follows.
BiocManager::install("schifferl/MultimodalExperiment", dependencies = TRUE, build_vignettes = TRUE)
Most users should simply install MultimodalExperiment from Bioconductor.
MultimodalExperiment is an S4 class that integrates bulk and single-cell experiment data; it is optimally storage-efficient, and its methods are exceptionally fast. It effortlessly represents multimodal data of any nature and features normalized experiment, subject, sample, and cell annotations, which are related to underlying biological experiments through maps. Its coordination methods are opt-in and employ database-like join operations internally to deliver fast and flexible management of multimodal data.
To begin using MultimodalExperiment, users should familiarize themselves with its application programming interface (API) outlined in the table below. The names of slot methods are consistent with those shown in the MultimodalExperiment Schematic (Figure 1), except bulkExperiments
and singleCellExperiments
because these are not actually individual slots. Instead, the experiments
slot contains a single ExperimentList
object with both bulk and single-cell experiments as elements; the experimentMap
is used to distinguish between the two types of experiments. Also, note that the API of MultimodalExperiment is relatively sparse because it is a data structure, and further packages are needed to conduct analysis.
Constructors | |
MultimodalExperiment |
construct a MultimodalExperiment object |
ExperimentList |
construct an ExperimentList object |
Slots | |
experimentData |
get or set experimentData (experiment annotations) |
subjectData |
get or set subjectData (subject annotations) |
sampleData |
get or set sampleData (sample annotations) |
cellData |
get or set cellData (cell annotations) |
experimentMap |
get or set experimentMap (experiment -> type map) |
subjectMap |
get or set subjectMap (subject -> experiment map) |
sampleMap |
get or set sampleMap (sample -> subject map) |
cellMap |
get or set cellMap (cell -> sample map) |
experiments |
get or set experiments |
metadata |
get or set metadata |
Annotations | |
joinAnnotations |
join experimentData, subjectData, sampleData, and cellData |
Maps | |
joinMaps |
join experimentMap, subjectMap, sampleMap, and cellMap |
Experiments | |
experiment(ME, i) |
get or set experiments element by index |
experiment(ME, "name") |
get or set experiments element by name |
bulkExperiments |
get or set experiments element(s) where type == "bulk" |
singleCellExperiments |
get or set experiments element(s) where type == "single-cell" |
Names | |
rownames |
get or set rownames of experiments element(s) |
colnames |
get or set colnames of experiments element(s) |
experimentNames |
get or set names of experiments |
Subsetting | |
ME[i, j] |
subset rows and/or columns of experiments |
ME[i, ] |
i : list, List, LogicalList, IntegerList, CharacterList |
ME[, j] |
j : list, List, LogicalList, IntegerList, CharacterList |
Coordination | |
propagate |
propagate experiment, subject, sample, and cell indices across all tables |
harmonize |
harmonize experiment, subject, sample, and cell indices across all tables |
As the API table above might suggest, the coordination methods of MultimodalExperiment are opt-in, meaning the propagation and harmonization of indices are deferred until the user request them. This philosophy prevents computationally expensive operations from being called repetitively while changes to a MultimodalExperiment object are made.
To demonstrate the functionality of the MultimodalExperiment class, a subset of the PBMCs of a Healthy Donor - 5’ Gene Expression with a Panel of TotalSeq™-C Antibodies dataset from 10x Genomics has been included in the MultimodalExperiment package. Specifically, human peripheral blood mononuclear cells (PBMCs) from a single healthy donor were profiled by cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) to generate single-cell antibody-derived tag sequencing (scADTseq) and single-cell RNA sequencing (scRNAseq) data simultaneously; the scRNAseq data was summed into pseudo-bulk RNA sequencing (pbRNAseq) data using scuttle. The dimensions of resulting matrices were reduced to conserve storage because these data are only used for demonstration here.
pbRNAseq[1:4, 1:1, drop = FALSE]
## SAMPLE-1
## A1BG 1020
## A1CF 0
## AAAS 413
## AACS 117
scRNAseq[1:4, 1:4, drop = FALSE]
## AAACCTGAGAGCAATT AAACCTGAGGCTCTTA AAACCTGAGTGAACGC AAACCTGCAAACGCGA
## A1BG 1 0 0 0
## A1CF 0 0 0 0
## AAAS 0 0 0 0
## AACS 0 0 0 0
scADTseq[1:4, 1:4, drop = FALSE]
## AAACCTGAGAGCAATT AAACCTGAGGCTCTTA AAACCTGAGTGAACGC AAACCTGCAAACGCGA
## CD3 225 1064 1833 18
## CD4 0 0 0 1
## CD14 0 0 0 3
## CD15 6890 29 42 47
The dataset does not include annotations, and only limited information can be gathered from its citation1 PBMCs of a Healthy Donor - 5’ Gene Expression with a Panel of TotalSeq™-C Antibodies, Single Cell Immune Profiling Dataset by Cell Ranger 3.0.0, 10x Genomics, (2018, November 19)., as follows:
Where a MultimodalExperiment object is constructed from these data in the proceeding section, these facts will be used to create experiment, subject, and sample annotations.
To construct a MultimodalExperiment object from the example data, begin by assigning an empty MultimodalExperiment object to the variable ME
.
ME <-
MultimodalExperiment()
Then, use the bulkExperiments<-
method to assign a named ExperimentList containing the pbRNAseq matrix as the bulkExperiments of the ME
object.
bulkExperiments(ME) <-
ExperimentList(
pbRNAseq = pbRNAseq
)
Next, use the singleCellExperiments<-
method to assign a named ExperimentList containing the scADTseq and scRNAseq matrices as the singleCellExperiments of the ME
object.
singleCellExperiments(ME) <-
ExperimentList(
scADTseq = scADTseq,
scRNAseq = scRNAseq
)
The bulkExperiments<-
and singleCellExperiments<-
methods are the only exceptions to opt-in coordination; they automatically propagate experiment, sample, and cell indices into the relevant annotation (experimentData, sampleData, and cellData) and map (experimentMap, sampleMap, and cellMap) slots to simplify the process of construing a MultimodalExperiment object. Despite their automatic propagation, these methods remain computationally efficient because they do not call propagate
internally.
To establish that all experiments are related to a single subject, the value "SUBJECT-1"
is assigned to the "subject"
column of the subjectMap
.
subjectMap(ME)[["subject"]] <-
"SUBJECT-1"
To establish that all samples are related to a single subject, the value "SUBJECT-1"
is assigned to the "subject"
column of the sampleMap
.
sampleMap(ME)[["subject"]] <-
"SUBJECT-1"
To establish that all cells are related to a single sample, the value "SAMPLE-1"
is assigned to the "sample"
column of the cellMap
.
cellMap(ME)[["sample"]] <-
"SAMPLE-1"
To make the relationships established in the preceding steps clear to the reader, the joinMaps
method is used to display all maps joined into an unnormalized DataFrame object.
joinMaps(ME)
## DataFrame with 10002 rows and 5 columns
## type experiment subject sample cell
## <character> <character> <character> <character> <character>
## 1 NA NA NA NA NA
## 2 bulk pbRNAseq SUBJECT-1 SAMPLE-1 NA
## 3 single-cell scADTseq SUBJECT-1 SAMPLE-1 AAACCTGAGAGCAATT
## 4 single-cell scADTseq SUBJECT-1 SAMPLE-1 AAACCTGAGGCTCTTA
## 5 single-cell scADTseq SUBJECT-1 SAMPLE-1 AAACCTGAGTGAACGC
## ... ... ... ... ... ...
## 9998 single-cell scRNAseq SUBJECT-1 SAMPLE-1 TTTGTCAGTTGGACCC
## 9999 single-cell scRNAseq SUBJECT-1 SAMPLE-1 TTTGTCAGTTGGAGGT
## 10000 single-cell scRNAseq SUBJECT-1 SAMPLE-1 TTTGTCAGTTTAGCTG
## 10001 single-cell scRNAseq SUBJECT-1 SAMPLE-1 TTTGTCATCATGGTCA
## 10002 single-cell scRNAseq SUBJECT-1 SAMPLE-1 TTTGTCATCTCGTTTA
Although the relationships established should now be clear, it is important to note that the unnormalized representation is not storage efficient and is not how maps are stored in a MultimodalExperiment object. The design of MultimodalExperiment takes advantage of the structure of multimodal data where sample or cell indices are repeated across experiments by storing annotations and relationships only once.
In the MultimodalExperiment paradigm, cells belong to samples, samples belong to subjects, and subjects participate in experiments; these relationships were established above with modifications to the cell, sample, and subject maps. However, the subject indices created when the subject and sample maps were modified were not added to the row names of the subjectData
slot per the opt-in principle. The propagate
method inserts experiment, subject, sample, and cell indices into all relevant tables by taking their union and adding missing indices; it is used below to add the missing indices to the subjectData
slot.
ME <-
propagate(ME)
Experiment, subject, sample, and cell indices are now present across all annotation and map slots, and the order of row names across annotation slots is also known. The order of row names of experiment, sample, and cell annotations is consistent with their order of insertion; this means the experimentData
slot contains a DataFrame with three rows (pbRNAseq, scADTseq, and scRNAseq) and zero columns. To establish when the data were published, three dates are assigned to the "published"
column of experimentData
.
experimentData(ME)[["published"]] <-
c(NA_character_, "2018-11-19", "2018-11-19") |>
as.Date()
The data are known to come from a single, healthy subject; this is annotated by assigning the value "healthy"
to the "condition"
column of subjectData
.
subjectData(ME)[["condition"]] <-
as.character("healthy")
The data are also known to come from PBMCs; this is annotated by assigning the value "peripheral blood mononuclear cells"
to the "sampleType"
column of sampleData
.
sampleData(ME)[["sampleType"]] <-
as.character("peripheral blood mononuclear cells")
As no cell annotations are provided, a naive cell type classification function is implemented below for demonstration (i.e., do not use these classifications for research purposes).
cellType <- function(x) {
if (x[["CD4"]] > 0L) {
return("T Cell")
}
if (x[["CD14"]] > 0L) {
return("Monocyte")
}
if (x[["CD19"]] > 0L) {
return("B Cell")
}
if (x[["CD56"]] > 0L) {
return("NK Cell")
}
NA_character_
}
To annotate cell types, the "cellType"
column of cellData
is assigned by piping the scADTseq
experiment to the apply function, which applies the cellType
function over the columns of the matrix.
cellData(ME)[["cellType"]] <-
experiment(ME, "scADTseq") |>
apply(2L, cellType)
This completes the process of constructing a MultimodalExperiment object from the example data; now, when the ME
variable is called, the show
method is used to display essential information about the object.
ME
## MultimodalExperiment with 1 bulk and 2 single-cell experiment(s).
##
## experimentData: DataFrame with 3 row(s) and 1 column(s).
## published
## <Date>
## pbRNAseq NA
## scADTseq 2018-11-19
## scRNAseq 2018-11-19
##
## subjectData: DataFrame with 1 row(s) and 1 column(s).
## condition
## <character>
## SUBJECT-1 healthy
##
## sampleData: DataFrame with 1 row(s) and 1 column(s).
## sampleType
## <character>
## SAMPLE-1 peripheral blood mononuclear cells
##
## cellData: DataFrame with 5000 row(s) and 1 column(s).
## cellType
## <character>
## AAACCTGAGAGCAATT B Cell
## AAACCTGAGGCTCTTA NK Cell
## ... ...
## TTTGTCATCATGGTCA NK Cell
## TTTGTCATCTCGTTTA NK Cell
##
## bulkExperiments: ExperimentList with 1 bulk experiment(s).
## [1] pbRNAseq: matrix with 3000 row(s) and 1 column(s).
##
## singleCellExperiments: ExperimentList with 2 single-cell experiment(s).
## [1] scADTseq: matrix with 8 row(s) and 5000 column(s).
## [2] scRNAseq: matrix with 3000 row(s) and 5000 column(s).
##
## Need help? Try browseVignettes("MultimodalExperiment").
## Publishing? Cite with citation("MultimodalExperiment").
Notice that cellData
contains only 5,000 rows, while there are two singleCellExperiments
of 5,000 rows each; the annotations are stored just once because they apply to the same single cells.
To help users understand how to manipulate a MultimodalExperiment object, a brief example of how to filter out everything except monocytes in singleCellExperiments is shown here. First, a logical vector is created from the "cellType"
column of cellData
.
isMonocyte <-
cellData(ME)[["cellType"]] %in% "Monocyte"
Then, cellData
is assigned as cellData
subset to include only the rows which are annotated as monocytes.
cellData(ME) <-
cellData(ME)[isMonocyte, , drop = FALSE]
When the harmonize
method is called, the intersection of experiment, subject, sample, and cell indices from all relevant tables is taken, and extraneous indices are deleted. Notice the scADTseq
and scRNAseq
experiments only contain 685 columns each now.
harmonize(ME)
## MultimodalExperiment with 1 bulk and 2 single-cell experiment(s).
##
## experimentData: DataFrame with 3 row(s) and 1 column(s).
## published
## <Date>
## pbRNAseq NA
## scADTseq 2018-11-19
## scRNAseq 2018-11-19
##
## subjectData: DataFrame with 1 row(s) and 1 column(s).
## condition
## <character>
## SUBJECT-1 healthy
##
## sampleData: DataFrame with 1 row(s) and 1 column(s).
## sampleType
## <character>
## SAMPLE-1 peripheral blood mononuclear cells
##
## cellData: DataFrame with 685 row(s) and 1 column(s).
## cellType
## <character>
## AAACCTGTCTGTTGAG Monocyte
## AAACGGGCACCTCGTT Monocyte
## ... ...
## TTTGGTTTCGCCTGTT Monocyte
## TTTGTCAAGAAGGTTT Monocyte
##
## bulkExperiments: ExperimentList with 1 bulk experiment(s).
## [1] pbRNAseq: matrix with 3000 row(s) and 1 column(s).
##
## singleCellExperiments: ExperimentList with 2 single-cell experiment(s).
## [1] scADTseq: matrix with 8 row(s) and 685 column(s).
## [2] scRNAseq: matrix with 3000 row(s) and 685 column(s).
##
## Need help? Try browseVignettes("MultimodalExperiment").
## Publishing? Cite with citation("MultimodalExperiment").
Finally, while learning to use the MultimodalExperiment package, print out the Cheat Sheet and consult the documentation for specific methods for further usage examples.
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] MultimodalExperiment_1.2.0 IRanges_2.36.0
## [3] S4Vectors_0.40.0 BiocGenerics_0.48.0
## [5] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] Matrix_1.6-1.1 jsonlite_1.8.7
## [3] compiler_4.3.1 BiocManager_1.30.22
## [5] crayon_1.5.2 SummarizedExperiment_1.32.0
## [7] Biobase_2.62.0 GenomicRanges_1.54.0
## [9] bitops_1.0-7 jquerylib_0.1.4
## [11] yaml_2.3.7 fastmap_1.1.1
## [13] lattice_0.22-5 R6_2.5.1
## [15] XVector_0.42.0 S4Arrays_1.2.0
## [17] MultiAssayExperiment_1.28.0 GenomeInfoDb_1.38.0
## [19] knitr_1.44 DelayedArray_0.28.0
## [21] bookdown_0.36 MatrixGenerics_1.14.0
## [23] GenomeInfoDbData_1.2.11 bslib_0.5.1
## [25] rlang_1.1.1 cachem_1.0.8
## [27] xfun_0.40 sass_0.4.7
## [29] SparseArray_1.2.0 cli_3.6.1
## [31] zlibbioc_1.48.0 digest_0.6.33
## [33] grid_4.3.1 evaluate_0.22
## [35] abind_1.4-5 RCurl_1.98-1.12
## [37] rmarkdown_2.25 matrixStats_1.0.0
## [39] tools_4.3.1 htmltools_0.5.6.1