TileDBArray 1.17.0
TileDB implements a framework for local and remote storage of dense and sparse arrays.
We can use this as a DelayedArray
backend to provide an array-level abstraction,
thus allowing the data to be used in many places where an ordinary array or matrix might be used.
The TileDBArray package implements the necessary wrappers around TileDB-R
to support read/write operations on TileDB arrays within the DelayedArray framework.
TileDBArray
Creating a TileDBArray
is as easy as:
X <- matrix(rnorm(1000), ncol=10)
library(TileDBArray)
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.0682068 -0.2050417 -0.3273686 . 0.08695838 -0.61385996
## [2,] -1.1785605 -0.8905791 -1.8976670 . -0.02933603 -1.77599386
## [3,] 0.3642517 1.0368971 0.3178826 . 2.11538850 1.13462500
## [4,] 0.6967595 -0.7301417 -0.5026386 . -0.59332053 -0.25478507
## [5,] 1.0570916 1.1728837 -0.8080336 . 0.83328562 0.99638078
## ... . . . . . .
## [96,] -0.86264501 -0.53907861 0.16055824 . 1.6351225 -1.1457870
## [97,] 0.34014961 -0.94460983 -0.11413229 . 0.4030109 0.5595684
## [98,] -1.13627000 0.26973317 -0.91426782 . 0.2109381 -0.7118867
## [99,] -0.08043576 -0.06710426 -1.49705551 . 1.3801346 0.3832182
## [100,] 0.38954806 -0.26955608 -0.10370119 . 1.2874666 1.8728848
Alternatively, we can use coercion methods:
as(X, "TileDBArray")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -1.0682068 -0.2050417 -0.3273686 . 0.08695838 -0.61385996
## [2,] -1.1785605 -0.8905791 -1.8976670 . -0.02933603 -1.77599386
## [3,] 0.3642517 1.0368971 0.3178826 . 2.11538850 1.13462500
## [4,] 0.6967595 -0.7301417 -0.5026386 . -0.59332053 -0.25478507
## [5,] 1.0570916 1.1728837 -0.8080336 . 0.83328562 0.99638078
## ... . . . . . .
## [96,] -0.86264501 -0.53907861 0.16055824 . 1.6351225 -1.1457870
## [97,] 0.34014961 -0.94460983 -0.11413229 . 0.4030109 0.5595684
## [98,] -1.13627000 0.26973317 -0.91426782 . 0.2109381 -0.7118867
## [99,] -0.08043576 -0.06710426 -1.49705551 . 1.3801346 0.3832182
## [100,] 0.38954806 -0.26955608 -0.10370119 . 1.2874666 1.8728848
This process works also for sparse matrices:
Y <- Matrix::rsparsematrix(1000, 1000, density=0.01)
writeTileDBArray(Y)
## <1000 x 1000> sparse TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] 0.00 0.00 0.00 . 0 0
## [2,] 0.00 0.00 0.00 . 0 0
## [3,] 0.00 0.00 0.00 . 0 0
## [4,] 0.00 0.00 0.00 . 0 0
## [5,] 0.00 0.00 0.39 . 0 0
## ... . . . . . .
## [996,] 0 0 0 . 0 0
## [997,] 0 0 0 . 0 0
## [998,] 0 0 0 . 0 0
## [999,] 0 0 0 . 0 0
## [1000,] 0 0 0 . 0 0
Logical and integer matrices are supported:
writeTileDBArray(Y > 0)
## <1000 x 1000> sparse TileDBMatrix object of type "logical":
## [,1] [,2] [,3] ... [,999] [,1000]
## [1,] FALSE FALSE FALSE . FALSE FALSE
## [2,] FALSE FALSE FALSE . FALSE FALSE
## [3,] FALSE FALSE FALSE . FALSE FALSE
## [4,] FALSE FALSE FALSE . FALSE FALSE
## [5,] FALSE FALSE TRUE . FALSE FALSE
## ... . . . . . .
## [996,] FALSE FALSE FALSE . FALSE FALSE
## [997,] FALSE FALSE FALSE . FALSE FALSE
## [998,] FALSE FALSE FALSE . FALSE FALSE
## [999,] FALSE FALSE FALSE . FALSE FALSE
## [1000,] FALSE FALSE FALSE . FALSE FALSE
As are matrices with dimension names:
rownames(X) <- sprintf("GENE_%i", seq_len(nrow(X)))
colnames(X) <- sprintf("SAMP_%i", seq_len(ncol(X)))
writeTileDBArray(X)
## <100 x 10> TileDBMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -1.0682068 -0.2050417 -0.3273686 . 0.08695838 -0.61385996
## GENE_2 -1.1785605 -0.8905791 -1.8976670 . -0.02933603 -1.77599386
## GENE_3 0.3642517 1.0368971 0.3178826 . 2.11538850 1.13462500
## GENE_4 0.6967595 -0.7301417 -0.5026386 . -0.59332053 -0.25478507
## GENE_5 1.0570916 1.1728837 -0.8080336 . 0.83328562 0.99638078
## ... . . . . . .
## GENE_96 -0.86264501 -0.53907861 0.16055824 . 1.6351225 -1.1457870
## GENE_97 0.34014961 -0.94460983 -0.11413229 . 0.4030109 0.5595684
## GENE_98 -1.13627000 0.26973317 -0.91426782 . 0.2109381 -0.7118867
## GENE_99 -0.08043576 -0.06710426 -1.49705551 . 1.3801346 0.3832182
## GENE_100 0.38954806 -0.26955608 -0.10370119 . 1.2874666 1.8728848
TileDBArray
sTileDBArray
s are simply DelayedArray
objects and can be manipulated as such.
The usual conventions for extracting data from matrix-like objects work as expected:
out <- as(X, "TileDBArray")
dim(out)
## [1] 100 10
head(rownames(out))
## [1] "GENE_1" "GENE_2" "GENE_3" "GENE_4" "GENE_5" "GENE_6"
head(out[,1])
## GENE_1 GENE_2 GENE_3 GENE_4 GENE_5 GENE_6
## -1.0682068 -1.1785605 0.3642517 0.6967595 1.0570916 0.3595143
We can also perform manipulations like subsetting and arithmetic.
Note that these operations do not affect the data in the TileDB backend;
rather, they are delayed until the values are explicitly required,
hence the creation of the DelayedMatrix
object.
out[1:5,1:5]
## <5 x 5> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5
## GENE_1 -1.06820675 -0.20504170 -0.32736861 0.56684845 0.59431976
## GENE_2 -1.17856052 -0.89057912 -1.89766700 0.50098458 -0.12156972
## GENE_3 0.36425169 1.03689713 0.31788261 -1.53199192 0.28851562
## GENE_4 0.69675955 -0.73014167 -0.50263862 0.97572443 -1.64701171
## GENE_5 1.05709162 1.17288366 -0.80803361 -0.02226829 0.23514654
out * 2
## <100 x 10> DelayedMatrix object of type "double":
## SAMP_1 SAMP_2 SAMP_3 ... SAMP_9 SAMP_10
## GENE_1 -2.1364135 -0.4100834 -0.6547372 . 0.17391677 -1.22771992
## GENE_2 -2.3571210 -1.7811582 -3.7953340 . -0.05867206 -3.55198772
## GENE_3 0.7285034 2.0737943 0.6357652 . 4.23077701 2.26925000
## GENE_4 1.3935191 -1.4602833 -1.0052772 . -1.18664105 -0.50957014
## GENE_5 2.1141832 2.3457673 -1.6160672 . 1.66657124 1.99276156
## ... . . . . . .
## GENE_96 -1.7252900 -1.0781572 0.3211165 . 3.2702450 -2.2915741
## GENE_97 0.6802992 -1.8892197 -0.2282646 . 0.8060218 1.1191367
## GENE_98 -2.2725400 0.5394663 -1.8285356 . 0.4218762 -1.4237733
## GENE_99 -0.1608715 -0.1342085 -2.9941110 . 2.7602692 0.7664364
## GENE_100 0.7790961 -0.5391122 -0.2074024 . 2.5749332 3.7457695
We can also do more complex matrix operations that are supported by DelayedArray:
colSums(out)
## SAMP_1 SAMP_2 SAMP_3 SAMP_4 SAMP_5 SAMP_6
## 0.7775448 8.6349434 -5.6180974 -22.6138844 8.2715612 1.0200059
## SAMP_7 SAMP_8 SAMP_9 SAMP_10
## 10.1297930 10.5533277 34.4220984 -18.3355736
out %*% runif(ncol(out))
## [,1]
## GENE_1 0.251967691
## GENE_2 -2.588386272
## GENE_3 2.423907449
## GENE_4 -1.457590286
## GENE_5 0.202645673
## GENE_6 -0.522007357
## GENE_7 1.937268778
## GENE_8 0.195930337
## GENE_9 0.262320800
## GENE_10 -0.382436377
## GENE_11 0.109115680
## GENE_12 0.158129562
## GENE_13 -0.397197750
## GENE_14 2.127996036
## GENE_15 -1.264692852
## GENE_16 0.402560118
## GENE_17 2.583893992
## GENE_18 -1.212611928
## GENE_19 -2.474646104
## GENE_20 0.959716120
## GENE_21 -2.698488352
## GENE_22 -2.746825013
## GENE_23 -0.623915788
## GENE_24 2.757589856
## GENE_25 -0.398496290
## GENE_26 -1.603418040
## GENE_27 -2.511132958
## GENE_28 -0.411627149
## GENE_29 -1.412267164
## GENE_30 -2.242272890
## GENE_31 -3.346856773
## GENE_32 -1.362375026
## GENE_33 -2.808922473
## GENE_34 -2.097933324
## GENE_35 -0.454333576
## GENE_36 -0.828383876
## GENE_37 -2.299265148
## GENE_38 -0.516644455
## GENE_39 1.702969400
## GENE_40 1.207660995
## GENE_41 1.043554182
## GENE_42 -0.117867346
## GENE_43 -1.472926425
## GENE_44 -1.157480936
## GENE_45 -1.294053238
## GENE_46 0.609291747
## GENE_47 1.554016095
## GENE_48 -1.008805942
## GENE_49 1.154909397
## GENE_50 2.673041776
## GENE_51 -0.989728370
## GENE_52 0.205286449
## GENE_53 -0.927516410
## GENE_54 1.641602948
## GENE_55 1.286401012
## GENE_56 0.300920374
## GENE_57 -0.084314935
## GENE_58 -0.613415446
## GENE_59 -1.813686603
## GENE_60 0.050812633
## GENE_61 1.020089920
## GENE_62 3.239533072
## GENE_63 -0.718627091
## GENE_64 1.760121994
## GENE_65 0.003502501
## GENE_66 1.818345511
## GENE_67 3.787951435
## GENE_68 0.374266710
## GENE_69 -0.174519452
## GENE_70 0.208593705
## GENE_71 1.945304405
## GENE_72 2.417959157
## GENE_73 1.649496905
## GENE_74 -1.642990868
## GENE_75 -1.146935189
## GENE_76 1.734445220
## GENE_77 0.947843165
## GENE_78 -0.837474713
## GENE_79 -1.177188671
## GENE_80 1.541475508
## GENE_81 1.966165689
## GENE_82 0.710586886
## GENE_83 2.691609465
## GENE_84 -0.582373280
## GENE_85 0.025560627
## GENE_86 3.376929275
## GENE_87 -2.177582785
## GENE_88 0.891221799
## GENE_89 1.838329989
## GENE_90 -0.549336495
## GENE_91 0.087743015
## GENE_92 4.520275887
## GENE_93 -0.632948377
## GENE_94 2.385776470
## GENE_95 -1.212456017
## GENE_96 0.026447697
## GENE_97 0.040318256
## GENE_98 -0.156173495
## GENE_99 -0.943357106
## GENE_100 1.203714533
We can adjust some parameters for creating the backend with appropriate arguments to writeTileDBArray()
.
For example, the example below allows us to control the path to the backend
as well as the name of the attribute containing the data.
X <- matrix(rnorm(1000), ncol=10)
path <- tempfile()
writeTileDBArray(X, path=path, attr="WHEE")
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.50261478 1.04585892 -0.02699426 . 0.4227805 1.2036202
## [2,] -0.25660636 -0.68525173 -0.61128672 . -1.0798965 -0.2939703
## [3,] -1.37354491 -0.19129654 -0.83833554 . -1.8509017 -1.1704185
## [4,] -0.59715228 -0.77829348 -0.10581958 . -0.2750649 -0.8239823
## [5,] 0.02391330 -2.25325712 -0.87831212 . -1.5105871 -1.5876911
## ... . . . . . .
## [96,] 1.18697250 -2.44784615 1.61245372 . 1.67216566 0.78245132
## [97,] 0.09871422 1.83361536 0.03619117 . 0.84992522 -1.34589628
## [98,] -0.44868822 -1.93665962 -0.26129004 . 0.04973422 1.67507655
## [99,] -0.13637132 0.83750561 -0.15310829 . 0.56205604 -1.75502962
## [100,] -0.20595754 -0.09052913 -1.19745130 . -0.49681252 -0.78777581
As these arguments cannot be passed during coercion, we instead provide global variables that can be set or unset to affect the outcome.
path2 <- tempfile()
setTileDBPath(path2)
as(X, "TileDBArray") # uses path2 to store the backend.
## <100 x 10> TileDBMatrix object of type "double":
## [,1] [,2] [,3] ... [,9] [,10]
## [1,] -0.50261478 1.04585892 -0.02699426 . 0.4227805 1.2036202
## [2,] -0.25660636 -0.68525173 -0.61128672 . -1.0798965 -0.2939703
## [3,] -1.37354491 -0.19129654 -0.83833554 . -1.8509017 -1.1704185
## [4,] -0.59715228 -0.77829348 -0.10581958 . -0.2750649 -0.8239823
## [5,] 0.02391330 -2.25325712 -0.87831212 . -1.5105871 -1.5876911
## ... . . . . . .
## [96,] 1.18697250 -2.44784615 1.61245372 . 1.67216566 0.78245132
## [97,] 0.09871422 1.83361536 0.03619117 . 0.84992522 -1.34589628
## [98,] -0.44868822 -1.93665962 -0.26129004 . 0.04973422 1.67507655
## [99,] -0.13637132 0.83750561 -0.15310829 . 0.56205604 -1.75502962
## [100,] -0.20595754 -0.09052913 -1.19745130 . -0.49681252 -0.78777581
sessionInfo()
## R Under development (unstable) (2024-10-26 r87273 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows Server 2022 x64 (build 20348)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.utf8
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.utf8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RcppSpdlog_0.0.18 TileDBArray_1.17.0 DelayedArray_0.33.1
## [4] SparseArray_1.7.0 S4Arrays_1.7.1 IRanges_2.41.0
## [7] abind_1.4-8 S4Vectors_0.45.0 MatrixGenerics_1.19.0
## [10] matrixStats_1.4.1 BiocGenerics_0.53.1 generics_0.1.3
## [13] Matrix_1.7-1 BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0 jsonlite_1.8.9 compiler_4.5.0
## [4] BiocManager_1.30.25 crayon_1.5.3 Rcpp_1.0.13-1
## [7] nanoarrow_0.6.0 jquerylib_0.1.4 yaml_2.3.10
## [10] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [13] RcppCCTZ_0.2.12 XVector_0.47.0 tiledb_0.30.2
## [16] knitr_1.48 bookdown_0.41 bslib_0.8.0
## [19] rlang_1.1.4 cachem_1.1.0 xfun_0.49
## [22] sass_0.4.9 bit64_4.5.2 cli_3.6.3
## [25] zlibbioc_1.53.0 spdl_0.0.5 digest_0.6.37
## [28] grid_4.5.0 lifecycle_1.0.4 data.table_1.16.2
## [31] evaluate_1.0.1 nanotime_0.3.10 zoo_1.8-12
## [34] rmarkdown_2.28 tools_4.5.0 htmltools_0.5.8.1