spatialFDA is a package to calculate spatial statistics metrics and compare them with methods from functional data analysis. Here, we show how to perform a standard spatial analysis using spatialFDA.
This vignette serves as an overview how to use spatialFDA to perform functional data analysis on spatial statistics metrics. The main aim of this package is to detect differential spatial arrangements between cells in multi-sample/condition experiments. It combines the the estimation with the spatstat package with the inference of refund (Baddeley and Turner 2005; Baddeley, Rubak, and Turner 2015; Goldsmith et al. 2024).
This vignette is a brief overview version of the “Detailed Functional Data Analysis of Spatial Metrics” vignette. The content and code is in parts overlapping.
The use case is a dataset from Damond et al. (2019) which contains images from 12 human donors. The raw data is published under a CC-BY-4.0 License on Mendeley.
spatialFDA can be installed and loaded from Bioconductor as follows
if (!requireNamespace("BiocManager")) {
install.packages("BiocManager")
}
BiocManager::install("spatialFDA")
library("spatialFDA")
library("dplyr")
library("ggplot2")
library("tidyr")
library("stringr")
library("dplyr")
library("patchwork")
library("SpatialExperiment")
set.seed(1234)
In this vignette we will analyse a diabetes dataset acquired by imaging mass cytometry (IMC) as acquired by Damond et al. (2019). The dataset contains images from 12 human donors, 4 healthy and 8 with type 1 diabetes (T1D). With IMC, 35 markers were measured at single cell resolution (Damond et al. 2019).
The Damond et al. (2019) dataset is easily loaded from ExperimentHub via a small reader function .loadExample(). The entire dataset can be loaded by setting full = TRUE. For computational reasons, one can reduce to three patients as well by setting this flag to FALSE. We will subset the entire dataset to two samples per condition in order to have a multi-condition/multi-sample setting. The package offers multiple datatypes, we will use the SpatialExperiment (SPE) object (Righelli et al. 2022).
# retrieve example data from Damond et al. (2019)
spe <- .loadExample(full = TRUE)
spe <- subset(spe, ,patient_id %in% c(6089,6180,6126,6134,6228,6414))
# set cell types as factors
colData(spe)$cell_type <- as.factor(colData(spe)$cell_type)
We can look at the fields of view (FOVs) of the diabetes dataset. To do so we extract the spatial coordinates, store them as a dataframe and add the colData from the SPE to this. We will look only at the first four FOVs of the healthy sample. We plot both the cell categories of all cells and then the cell types of secretory cells (\(\alpha, \beta\) and \(\delta\) cells) and T-cells (CD8+ and CD4+ T-cells).
df <- data.frame(spatialCoords(spe), colData(spe))
dfSub <- df %>%
subset(image_name %in% c("E02", "E03", "E04", "E05"))
p <- ggplot(dfSub, aes(x = cell_x, y = cell_y, color = cell_category)) +
geom_point(size= 0.5) +
facet_wrap(~image_name) +
theme(legend.title.size = 20, legend.text.size = 20) +
xlab("x") +
ylab("y") +
labs(color = "cell category")+
coord_equal() +
theme_light()
dfSub <- dfSub %>%
subset(cell_type %in% c("alpha", "beta", "delta", "Th", "Tc"))
q <- ggplot(dfSub, aes(x = cell_x, y = cell_y, color = cell_type)) +
geom_point(size= 0.5) +
facet_wrap(~image_name) +
theme(legend.title.size = 20, legend.text.size = 20) +
xlab("x") +
ylab("y") +
labs(color = "cell type") +
coord_equal() +
theme_light()
wrap_plots(list(p,q), widths = c(1,1), heights = c(1,1), nrow = 2, ncol = 1)
spatialFDA consists of two main steps, the calculation of the spatial statistics function per individual image and the comparison of these functions via functional data analysis (FDA). spatialFDA contains the convenience function spatialInference which streamlines the estimation explained in the “Detailed Functional Data Analysis of Spatial Metrics” vignette.
In order to perform the spatial inference, we have to pass the SpatialExperiment object, the cell types we want to analyse (selection), how we want to subset the data (subsetby), the spatial statistic function to compute (fun), which columns are the marks/cell types (marks), the radius domain to compute on (rSeq) and the edge correction to perform (correction).
Furthermore, for the functional data analysis part we can provide the sample_id if there are replicates (sample_id, will lead to a mixed effects model), the transformation to apply to the output of the spatial statistics metric (here, we apply a square root transformation to stabilise the variance), the image ID (image_id) as well as the conditions (conditions)
colData(spe)[["patient_stage"]] <- factor(colData(spe)[["patient_stage"]])
#relevel to have non-diabetic as the reference category
colData(spe)[["patient_stage"]] <- relevel(colData(spe)[["patient_stage"]],
"Non-diabetic")
#run the spatial statistics inference
res <- spatialInference(
spe,
selection = c("alpha", "Tc"),
subsetby = "image_number",
fun = "Gcross",
marks = "cell_type",
rSeq = seq(0, 50, length.out = 50),
correction = "rs",
sample_id = "patient_id",
transformation = "Fisher",
eps = 0,
delta = "minNnDist",
family = mgcv::scat(link = "log"),
image_id = "image_number",
condition = "patient_stage",
ncores = 1
)
#> [1] "Calculating Gcross from alpha to Tc"
#> [1] "Creating design matrix with Non-diabetic as reference"
#> [1] "The adjusted R-squared of the model is 0.69173700235453"
names(res)
#> [1] "metricRes" "designmat"
#> [3] "mdl" "residual_standard_errors"
The output is a list of four objects: The dataframe with the calculated spatial statistics curves from spatstat, the design matrix for the statistical inference, the output of the pffr function from refund, as well as the residual standard error per condition (Baddeley, Rubak, and Turner 2015; Baddeley and Turner 2005; Goldsmith et al. 2024; Scheipl, Staicu, and Greven 2015).
We can visualise the spatial statistics curves per image with the function plotMetricPerFov. Note that these curves are square-root transformed, since we added a transformation parameter above.
metricRes <- res$metricRes
# create a unique plotting ID
metricRes$ID <- paste0(
metricRes$patient_stage, "|", metricRes$patient_id
)
# change levels for plotting
metricRes$ID <- factor(metricRes$ID, levels = c("Non-diabetic|6126",
"Non-diabetic|6134",
"Onset|6228","Onset|6414",
"Long-duration|6089",
"Long-duration|6180"))
# plot metrics
plotMetricPerFov(metricRes, correction = "rs", x = "r",
imageId = "image_number", ID = "ID", ncol = 2)
We note that there is pronounced variability between the three conditions, between the patients as well as between individual images.
In order to summarise the variability of the curves calculated above, we can use the
functional boxplot. The functional boxplot is a generalisation of a standard boxplot, giving information about the median curve (black solid line), the 50% central region (area coloured in magenta), the minimum and maximum envelopes (blue solid lines) and outlier curves (red dashed lines). Here, the fbplot function from the fda package is used (Sun and Genton 2011; Ramsay 2024).
# create a unique ID per row in the dataframe
metricRes$ID <- paste0(
metricRes$patient_stage, "x", metricRes$patient_id,
"x", metricRes$image_number
)
collector <- plotFbPlot(metricRes, "r", "rs", "patient_stage")
The functional boxplot provides a convenient summary of the spatial statistics curves. We see that the median curve for onset samples plateaus at a higher level and that non-diabetic and long-duration functional boxplots are in general very similar. There are some outlier curves but since they are not far off the envelopes, we do not filter them out.
In the functional boxplot we got an overview of the variability of the spatial statistic curves and could describe differences qualitatively. In order to test these differences, we use inference via functional data analysis. The functional general additive model (GAM) provides a way to perform statistical inference on the spatial statistics functions and is implemented in the function pffr from refund (Scheipl, Staicu, and Greven 2015; Scheipl, Gertheiss, and Greven 2016; Goldsmith et al. 2024).
mdl <- res$mdl
mm <- res$designmat
summary(mdl)
#>
#> Family: Scaled t(3,0.101)
#> Link function: log
#>
#> Formula:
#> Y ~ conditionLong_duration + conditionOnset + s(patient_id, bs = "re")
#> <environment: 0x17bd21348>
#>
#> Constant coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -1.7413 0.1799 -9.677 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Smooth terms & functional coefficients:
#> edf Ref.df Chi.sq p-value
#> Intercept(x) 17.311 19.000 784.083 < 2e-16 ***
#> conditionLong_duration(x) 1.001 1.001 0.085 0.771532
#> conditionOnset(x) 2.653 2.797 19.065 0.000936 ***
#> s(patient_id) 12.324 27.000 801.789 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.692 Deviance explained = 57%
#> -REML score = -6617.2 Scale est. = 1 n = 13904 (316 x 44)
The summary of the functional GAM provides information on the constant parameters and the functional coefficients. For the functional coefficients, an \(F\)-test over the entire domain is performed which answers the question if there is any difference between the reference curves (here “Non-diabetic”) and the conditions over the entire domain.
We note that there is a small non-significant difference in the \(G\) function between non-diabetic and long-duration T1D samples, but a strong difference between non-diabetic and onset T1D according to the model summary.
As the functional coefficients are functions of the domain \(r\) themselves, we can plot them too
plotLs <- lapply(colnames(mm), plotMdl, mdl = mdl,
shift = mdl$coefficients[["(Intercept)"]])
#> using seWithMean for s(x.vec) .
#> using seWithMean for s(x.vec) .
#> using seWithMean for s(x.vec) .
wrap_plots(plotLs, nrow = 3, axes = 'collect')
From the functional coefficients we see that the difference betwenn Non-diabetic and Onset diabetic curves is at short distances and that the difference becomes less pronounced at higher distances. This means that the differential co-localisation between cytotoxic T-cells and alpha cells in the islet happens at shorter distances.
The point wise confidence bands are a limitation of this method and could be improved with either bootstrapping or continuous confidence bands (Liebl and Reimherr 2023).
The functional GAMs provide coefficients that we could plot above. In order to judge the confidence we can have in these coefficients, we look to quantify the model fit. First, we look at the correlation and check the qq plot of the model residuals.
resid(mdl) |> cor() |> filled.contour(levels = seq(-1, 1, l = 40))
resid(mdl) |> cov() |> filled.contour()
qqnorm(resid(mdl), pch = 16)
qqline(resid(mdl))
In these model diagnostics, we note that there is still some variability in the residuals that is not considered by the model. The Q-Q plot indicates a good but not perfect model fit. The residuals show a considerable structure that is in line with the structure in the auto-covariance / correlation plots.
sessionInfo()
#> R version 4.5.1 Patched (2025-06-14 r88325)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Ventura 13.7.7
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] refund_0.1-37 SpatialExperiment_1.19.1
#> [3] SingleCellExperiment_1.31.1 SummarizedExperiment_1.39.1
#> [5] Biobase_2.69.0 GenomicRanges_1.61.1
#> [7] Seqinfo_0.99.2 IRanges_2.43.0
#> [9] S4Vectors_0.47.0 BiocGenerics_0.55.1
#> [11] generics_0.1.4 MatrixGenerics_1.21.0
#> [13] matrixStats_1.5.0 patchwork_1.3.2
#> [15] stringr_1.5.1 tidyr_1.3.1
#> [17] ggplot2_3.5.2 dplyr_1.1.4
#> [19] spatialFDA_1.1.15 BiocStyle_2.37.1
#>
#> loaded via a namespace (and not attached):
#> [1] RColorBrewer_1.1-3 jsonlite_2.0.0 magrittr_2.0.3
#> [4] rainbow_3.8 spatstat.utils_3.1-5 magick_2.8.7
#> [7] farver_2.1.2 nloptr_2.2.1 rmarkdown_2.29
#> [10] vctrs_0.6.5 memoise_2.0.1 minqa_1.2.8
#> [13] spatstat.explore_3.5-2 RCurl_1.98-1.17 tinytex_0.57
#> [16] htmltools_0.5.8.1 S4Arrays_1.9.1 AnnotationHub_3.99.6
#> [19] curl_7.0.0 deSolve_1.40 SparseArray_1.9.1
#> [22] sass_0.4.10 hdrcde_3.4 pracma_2.4.4
#> [25] KernSmooth_2.23-26 bslib_0.9.0 RLRsim_3.1-8
#> [28] httr2_1.2.1 cachem_1.1.0 lifecycle_1.0.4
#> [31] pkgconfig_2.0.3 Matrix_1.7-4 R6_2.6.1
#> [34] fastmap_1.2.0 rbibutils_2.3 magic_1.6-1
#> [37] digest_0.6.37 colorspace_2.1-1 AnnotationDbi_1.71.1
#> [40] tensor_1.5.1 ExperimentHub_2.99.5 RSQLite_2.4.3
#> [43] filelock_1.0.3 labeling_0.4.3 spatstat.sparse_3.1-0
#> [46] httr_1.4.7 polyclip_1.10-7 abind_1.4-8
#> [49] mgcv_1.9-3 compiler_4.5.1 bit64_4.6.0-1
#> [52] withr_3.0.2 DBI_1.2.3 gamm4_0.2-7
#> [55] MASS_7.3-65 rappdirs_0.3.3 DelayedArray_0.35.2
#> [58] rjson_0.2.23 tools_4.5.1 goftest_1.2-3
#> [61] glue_1.8.0 nlme_3.1-168 grid_4.5.1
#> [64] cluster_2.1.8.1 pbs_1.1 gtable_0.3.6
#> [67] fda_6.3.0 spatstat.data_3.1-8 XVector_0.49.0
#> [70] spatstat.geom_3.5-0 BiocVersion_3.22.0 pillar_1.11.0
#> [73] splines_4.5.1 BiocFileCache_2.99.6 lattice_0.22-7
#> [76] bit_4.6.0 deldir_2.0-4 grpreg_3.5.0
#> [79] ks_1.15.1 tidyselect_1.2.1 fds_1.8
#> [82] Biostrings_2.77.2 knitr_1.50 reformulas_0.4.1
#> [85] bookdown_0.44 xfun_0.53 stringi_1.8.7
#> [88] yaml_2.3.10 boot_1.3-32 evaluate_1.0.5
#> [91] tibble_3.3.0 BiocManager_1.30.26 cli_3.6.5
#> [94] Rdpack_2.6.4 jquerylib_0.1.4 dichromat_2.0-0.1
#> [97] Rcpp_1.1.0 spatstat.random_3.4-1 dbplyr_2.5.0
#> [100] png_0.1-8 spatstat.univar_3.1-4 parallel_4.5.1
#> [103] blob_1.2.4 mclust_6.1.1 bitops_1.0-9
#> [106] lme4_1.1-37 mvtnorm_1.3-3 scales_1.4.0
#> [109] pcaPP_2.0-5 purrr_1.1.0 crayon_1.5.3
#> [112] rlang_1.1.6 KEGGREST_1.49.1