alabaster.matrix 1.2.0
The alabaster.matrix package implements methods to save matrix-like objects to file artifacts and load them back into R. Check out the alabaster.base for more details on the motivation and the alabaster framework.
Given an array-like object, we can use stageObject() to save it inside a staging directory:
library(Matrix)
y <- rsparsematrix(1000, 100, density=0.05)
library(alabaster.matrix)
tmp <- tempfile()
dir.create(tmp)
meta <- stageObject(y, tmp, "my_sparse_matrix")
library(alabaster.base)
.writeMetadata(meta, tmp)
## $type
## [1] "local"
##
## $path
## [1] "my_sparse_matrix/matrix.h5"
list.files(tmp, recursive=TRUE)
## [1] "my_sparse_matrix/matrix.h5" "my_sparse_matrix/matrix.h5.json"
We then load it back into our R session with loadObject().
This creates a HDF5-backed DelayedArray that can be easily coerced into the desired format, e.g., a dgCMatrix.
meta <- acquireMetadata(tmp, "my_sparse_matrix/matrix.h5")
roundtrip <- loadObject(meta, tmp)
class(roundtrip)
## [1] "H5SparseMatrix"
## attr(,"package")
## [1] "HDF5Array"
This process is supported for all base arrays, Matrix objects and DelayedArray objects.
For DelayedArrays, we may instead choose to save the delayed operations themselves to file, using the chihaya package.
This creates a HDF5 file following the chihaya format, containing the delayed operations rather than the results of their evaluation.
library(DelayedArray)
y <- DelayedArray(rsparsematrix(1000, 100, 0.05))
y <- log1p(abs(y) / 1:100) # adding some delayed ops.
preserveDelayedOperations(TRUE)
meta <- stageObject(y, tmp, "delayed")
.writeMetadata(meta, tmp)
## $type
## [1] "local"
##
## $path
## [1] "delayed/delayed.h5"
meta <- acquireMetadata(tmp, "delayed/delayed.h5")
roundtrip <- loadObject(meta, tmp)
class(roundtrip)
## [1] "DelayedMatrix"
## attr(,"package")
## [1] "DelayedArray"
However, it is probably best to avoid preserving delayed operations for file-backed DelayedArrays if you want the artifacts to be re-usable on different filesystems.
For example, HDF5Arrays will be saved with a reference to an absolute file path, which will not be portable.
sessionInfo()
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 22.04.3 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] DelayedArray_0.28.0 SparseArray_1.2.0 S4Arrays_1.2.0
## [4] abind_1.4-5 IRanges_2.36.0 S4Vectors_0.40.0
## [7] MatrixGenerics_1.14.0 matrixStats_1.0.0 BiocGenerics_0.48.0
## [10] alabaster.base_1.2.0 alabaster.matrix_1.2.0 Matrix_1.6-1.1
## [13] BiocStyle_2.30.0
##
## loaded via a namespace (and not attached):
## [1] jsonlite_1.8.7 compiler_4.3.1 BiocManager_1.30.22
## [4] crayon_1.5.2 Rcpp_1.0.11 rhdf5filters_1.14.0
## [7] jquerylib_0.1.4 yaml_2.3.7 fastmap_1.1.1
## [10] lattice_0.22-5 jsonvalidate_1.3.2 R6_2.5.1
## [13] XVector_0.42.0 curl_5.1.0 knitr_1.44
## [16] chihaya_1.2.0 bookdown_0.36 bslib_0.5.1
## [19] rlang_1.1.1 V8_4.4.0 cachem_1.0.8
## [22] HDF5Array_1.30.0 xfun_0.40 sass_0.4.7
## [25] cli_3.6.1 Rhdf5lib_1.24.0 zlibbioc_1.48.0
## [28] digest_0.6.33 grid_4.3.1 alabaster.schemas_1.2.0
## [31] rhdf5_2.46.0 evaluate_0.22 rmarkdown_2.25
## [34] tools_4.3.1 htmltools_0.5.6.1