TileDBArray 1.16.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.23549229 0.13814272 -0.49716659 . -0.74158876 1.19135123
## [2,] -0.03768341 -0.08833808 0.14232411 . -0.87448969 1.41504993
## [3,] 0.51231570 1.01275502 -0.65603746 . 1.14956419 -0.34109990
## [4,] 0.83904143 -1.25204679 0.23813822 . -0.15001145 -0.07863399
## [5,] 0.23991931 1.22496122 -0.19645694 . 0.66247521 0.80681765
## ... . . . . . .
## [96,] 0.090592743 0.872869656 -1.164201441 . 1.55317495 0.80527167
## [97,] -0.217269487 -0.725859085 -0.846436104 . -0.39540754 0.05456771
## [98,] -0.052490022 -0.008215714 -1.125708094 . -1.15782618 -0.02684772
## [99,] 0.692872398 -0.349783244 0.853744417 . -0.01093681 1.39418274
## [100,] 0.612089389 -0.408543912 0.776001402 . -0.74250973 0.41653190
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.23549229 0.13814272 -0.49716659 . -0.74158876 1.19135123
## [2,] -0.03768341 -0.08833808 0.14232411 . -0.87448969 1.41504993
## [3,] 0.51231570 1.01275502 -0.65603746 . 1.14956419 -0.34109990
## [4,] 0.83904143 -1.25204679 0.23813822 . -0.15001145 -0.07863399
## [5,] 0.23991931 1.22496122 -0.19645694 . 0.66247521 0.80681765
## ... . . . . . .
## [96,] 0.090592743 0.872869656 -1.164201441 . 1.55317495 0.80527167
## [97,] -0.217269487 -0.725859085 -0.846436104 . -0.39540754 0.05456771
## [98,] -0.052490022 -0.008215714 -1.125708094 . -1.15782618 -0.02684772
## [99,] 0.692872398 -0.349783244 0.853744417 . -0.01093681 1.39418274
## [100,] 0.612089389 -0.408543912 0.776001402 . -0.74250973 0.41653190
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0 0 0 . 0 0
## [2,] 0 0 0 . 0 0
## [3,] 0 0 0 . 0 0
## [4,] 0 0 0 . 0 0
## [5,] 0 0 0 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE FALSE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.23549229 0.13814272 -0.49716659 . -0.74158876 1.19135123
## GENE_2 -0.03768341 -0.08833808 0.14232411 . -0.87448969 1.41504993
## GENE_3 0.51231570 1.01275502 -0.65603746 . 1.14956419 -0.34109990
## GENE_4 0.83904143 -1.25204679 0.23813822 . -0.15001145 -0.07863399
## GENE_5 0.23991931 1.22496122 -0.19645694 . 0.66247521 0.80681765
## ... . . . . . .
## GENE_96 0.090592743 0.872869656 -1.164201441 . 1.55317495 0.80527167
## GENE_97 -0.217269487 -0.725859085 -0.846436104 . -0.39540754 0.05456771
## GENE_98 -0.052490022 -0.008215714 -1.125708094 . -1.15782618 -0.02684772
## GENE_99 0.692872398 -0.349783244 0.853744417 . -0.01093681 1.39418274
## GENE_100 0.612089389 -0.408543912 0.776001402 . -0.74250973 0.41653190
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -0.23549229 -0.03768341 0.51231570 0.83904143 0.23991931 0.51809242
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -0.23549229 0.13814272 -0.49716659 -1.94720598 -0.82757351
## GENE_2 -0.03768341 -0.08833808 0.14232411 0.05230119 -1.48468021
## GENE_3 0.51231570 1.01275502 -0.65603746 -1.26414545 0.82632769
## GENE_4 0.83904143 -1.25204679 0.23813822 -0.87140153 0.94762613
## GENE_5 0.23991931 1.22496122 -0.19645694 -0.68717533 1.15242893
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -0.47098458 0.27628544 -0.99433318 . -1.4831775 2.3827025
## GENE_2 -0.07536682 -0.17667616 0.28464823 . -1.7489794 2.8300999
## GENE_3 1.02463141 2.02551004 -1.31207493 . 2.2991284 -0.6821998
## GENE_4 1.67808286 -2.50409358 0.47627643 . -0.3000229 -0.1572680
## GENE_5 0.47983863 2.44992245 -0.39291389 . 1.3249504 1.6136353
## ... . . . . . .
## GENE_96 0.18118549 1.74573931 -2.32840288 . 3.10634990 1.61054333
## GENE_97 -0.43453897 -1.45171817 -1.69287221 . -0.79081509 0.10913542
## GENE_98 -0.10498004 -0.01643143 -2.25141619 . -2.31565236 -0.05369544
## GENE_99 1.38574480 -0.69956649 1.70748883 . -0.02187361 2.78836548
## GENE_100 1.22417878 -0.81708782 1.55200280 . -1.48501946 0.83306381
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 20.0146000 -26.6300969 -12.8370064 -6.5005336 -5.8969472 5.3044866
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 0.8032806 -3.7405497 -4.4930097 6.3714438
out %*% runif(ncol(out))
## [,1]
## GENE_1 -0.521556965
## GENE_2 -0.179037942
## GENE_3 0.295215278
## GENE_4 0.762741195
## GENE_5 2.700866817
## GENE_6 -2.847499164
## GENE_7 -1.929410913
## GENE_8 -1.141941634
## GENE_9 -2.692971574
## GENE_10 2.303883063
## GENE_11 -1.430535420
## GENE_12 1.886745210
## GENE_13 3.725935951
## GENE_14 1.516184244
## GENE_15 1.553177988
## GENE_16 0.988281871
## GENE_17 0.399789556
## GENE_18 1.188774885
## GENE_19 0.141100668
## GENE_20 3.817848577
## GENE_21 2.549820771
## GENE_22 -1.196245535
## GENE_23 1.733301771
## GENE_24 -1.842210738
## GENE_25 -2.561756034
## GENE_26 -0.396419103
## GENE_27 -2.417189316
## GENE_28 1.745547312
## GENE_29 0.024561972
## GENE_30 -3.402939507
## GENE_31 0.944974249
## GENE_32 -2.332498157
## GENE_33 2.720536460
## GENE_34 -1.930473175
## GENE_35 0.911572740
## GENE_36 -1.877641665
## GENE_37 -2.982466461
## GENE_38 0.009640123
## GENE_39 -2.137668670
## GENE_40 3.132021078
## GENE_41 0.328482529
## GENE_42 -1.755558595
## GENE_43 -0.890719211
## GENE_44 1.260622618
## GENE_45 1.566086401
## GENE_46 0.777009995
## GENE_47 -0.705470625
## GENE_48 -0.837855969
## GENE_49 2.825385715
## GENE_50 -1.307829767
## GENE_51 2.585766407
## GENE_52 1.041315568
## GENE_53 0.774519244
## GENE_54 -1.087416796
## GENE_55 -0.750985564
## GENE_56 -2.156282282
## GENE_57 -1.816397547
## GENE_58 -1.184696885
## GENE_59 -0.850450769
## GENE_60 2.249541995
## GENE_61 -3.128426062
## GENE_62 -1.371671961
## GENE_63 -2.154354183
## GENE_64 1.261136625
## GENE_65 -0.544858182
## GENE_66 -1.334815199
## GENE_67 -3.676119590
## GENE_68 -1.770366150
## GENE_69 -1.325352346
## GENE_70 -0.378747698
## GENE_71 -1.865018970
## GENE_72 -2.253107582
## GENE_73 -1.117988432
## GENE_74 0.880821494
## GENE_75 0.025145792
## GENE_76 3.595821294
## GENE_77 1.351632695
## GENE_78 -2.530576726
## GENE_79 2.315388244
## GENE_80 2.471494776
## GENE_81 3.258952575
## GENE_82 -0.848752678
## GENE_83 -0.418899464
## GENE_84 0.314018419
## GENE_85 -1.922718832
## GENE_86 -1.513661972
## GENE_87 -1.687958178
## GENE_88 -0.158616480
## GENE_89 -1.505402570
## GENE_90 3.584858622
## GENE_91 0.819929680
## GENE_92 -3.204191995
## GENE_93 1.353437096
## GENE_94 1.723447812
## GENE_95 -2.074665893
## GENE_96 2.163600836
## GENE_97 -0.901495716
## GENE_98 -1.880863689
## GENE_99 2.536379176
## GENE_100 0.272586690
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 2.2645128 1.1040241 0.3603278 . 0.4658369 -0.7037810
## [2,] 0.4677117 0.6828181 1.1748596 . -1.3944726 0.5885398
## [3,] -1.4785790 -0.4667107 -0.1719616 . 2.1062553 -1.6481896
## [4,] 2.0824958 -0.4074978 0.1859697 . -1.0200215 0.5700174
## [5,] 0.1108453 0.1331158 -0.4353114 . -1.3071273 -1.8001625
## ... . . . . . .
## [96,] -0.3217038 -2.5298485 -0.6636672 . -0.31334678 0.41758905
## [97,] -0.7646297 0.3130591 2.4097011 . 1.42734624 -0.68110003
## [98,] 0.8529417 0.4144447 -0.5010519 . -0.83940890 -0.45561404
## [99,] 0.6769514 0.6852695 0.2898949 . 1.25277271 -0.05056707
## [100,] -0.9862603 0.6311622 0.5812869 . -0.07866183 -1.75894472
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] 2.2645128 1.1040241 0.3603278 . 0.4658369 -0.7037810
## [2,] 0.4677117 0.6828181 1.1748596 . -1.3944726 0.5885398
## [3,] -1.4785790 -0.4667107 -0.1719616 . 2.1062553 -1.6481896
## [4,] 2.0824958 -0.4074978 0.1859697 . -1.0200215 0.5700174
## [5,] 0.1108453 0.1331158 -0.4353114 . -1.3071273 -1.8001625
## ... . . . . . .
## [96,] -0.3217038 -2.5298485 -0.6636672 . -0.31334678 0.41758905
## [97,] -0.7646297 0.3130591 2.4097011 . 1.42734624 -0.68110003
## [98,] 0.8529417 0.4144447 -0.5010519 . -0.83940890 -0.45561404
## [99,] 0.6769514 0.6852695 0.2898949 . 1.25277271 -0.05056707
## [100,] -0.9862603 0.6311622 0.5812869 . -0.07866183 -1.75894472
sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.18 TileDBArray_1.16.0 DelayedArray_0.32.0
## [4] SparseArray_1.6.0 S4Arrays_1.6.0 IRanges_2.40.0
## [7] abind_1.4-8 S4Vectors_0.44.0 MatrixGenerics_1.18.0
## [10] matrixStats_1.4.1 BiocGenerics_0.52.0 Matrix_1.7-1
## [13] BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0 jsonlite_1.8.9 compiler_4.4.1
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.46.0 tiledb_0.30.2
## [16] knitr_1.48 bookdown_0.41 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.48
## [22] sass_0.4.9 bit64_4.5.2 cli_3.6.3
## [25] zlibbioc_1.52.0 spdl_0.0.5 digest_0.6.37
## [28] grid_4.4.1 lifecycle_1.0.4 data.table_1.16.2
## [31] evaluate_1.0.1 nanotime_0.3.10 zoo_1.8-12
## [34] rmarkdown_2.28 tools_4.4.1 htmltools_0.5.8.1