Contents

1 Introduction

The acquisition of multi-omics data with complex experimental designs is a widely adopted practice to identify entities and unravel the biological processes they are involved in.

Exploring each omics layer usually serves as an initial step to investigate and extract relevant biological variability. Statistical integration can then focus on pertinent omics layers and features.

Most existing solutions for the analysis of such data are command-line based, which limits accessibility for many users. Handling many packages, methods and their results can become a burden even for experienced R-users. Interface solutions exist, like ExploreModelMatrix, pcaExplorer MatrixQCvis or ideal, which each allows perform one particular step of the analysis, or MSstatsShiny, which is designed for one particular omics type.

Here we present RFLOMICS, an R package with a Shiny interface, designed to ensure a guided, comprehensive analysis and visualization of transcriptomics, proteomics, or metabolomics data. RFLOMICS covers the entire process from defining the statistical model to multi-omics integration, all within a single application.

RFLOMICS uses extended classes of the MultiAssayExperiment and SummarizedExperiment
classes to ensure efficient data management of heterogeneous data.

A flexible HTML report can be generated at any stage of the analysis, providing results, methods, and settings to facilitate the analysis reproducibility. Additionally, users can download an archive containing all produced results.

2 Install and run RFLOMICS

You can install the latest RFLOMICS development from github:

# BiocManager::install("RFLOMICS/RFLOMICS")

Then you can access the interface with the following commands:

library(RFLOMICS)

3 Guided tour of the interface

This part of the vignette will follow the different steps of a multi-omics analysis trough the RFLOMICS interface which are:

They will appear in the left tab menu of the interface along with the analysis progression. Whereas single omics analysis steps will be displayed in tab panel associated to each dataset .

The interface can be accessed using:

# runRFLOMICS()

3.1 Load Data

Data loading starts by setting a project name (it is mandatory), then load the experimental design and finally the different omics datasets.

3.1.1 Dataset example description

Data used for this vignette have been provided by Pr. Loic Rajjou and Gwendal Cueff. It is included in the ‘ExamplesFiles’ directory of the package. Briefly, A. thaliana’s transcriptome, metabolome and proteome have been obtained in the context of the study of seed germination and vigor. In particular, the authors were interested in deciphering key entities involved in response to environmental stresses (on the mother plant): influences of temperature (high, medium and low) and imbibition stage (Dry: DI, early imbibition: EI and late imbibition: LI). See(https://www.uibk.ac.at/botany/ecoseed/home/)

3.1.2 Load Experimental Design

The experimental design file must contain experimental information/conditions for each sample in a tab separated values format (see RFLOMICS-input-data.html). As soon as this file is loaded, for each design factor, we have to set up the factor type (Biological factor: Bio; batch factor: batch; metadata information: meta) and a reference level.

  • It is required to have a minimum of 1 biological factor (maximum 3) and 1 batch factor. Order of the columns does not matter.

  • It is possible to remove factor’s conditions from the design. Associated samples will be excluded from the analysis even if they are present in the omics dataset matrix.

  • It is also possible to re-order factor modalities

3.1.3 Load Omics data

Experimental data must be loaded as matrix files in a tab separated values format. (for more information see RFLOMICS-input-data.html). For each dataset it is mandatory to specify the type of omics (transcriptomics, proteomics, metabolomics ) as it will adapt the data processing and analysis methods. By default, datasets will be called set1, set2, set3, …, respectively. A more suitable name can be given by changing ‘Dataset name’ in the interface.

To load the 3 examples datasets of the Ecoseed project: RNAseq data (read counts), proteomics data (abundance of proteins), and metabolomics data (intensity of metabolites) (see RFLOMICS-input-data.html), please press the ‘Load Ecoseed Data’.

As soon as the data are loaded the RFLOMICS MultiAssayExperiment object is created. The data overview panel displays the samples overlap across datasets and the number of datasets per combination of biological factors.

3.2 Set the statistical framework

The interface provides models written from the simplest one to the most complete one. Only the second order interaction terms between the biological factors appear. Batch factors never appear in interaction terms. In our example, the complete model is chosen.

All ‘a priori’ hypotheses (contrasts) are then calculated from the chosen model and displayed. The user must select those that correspond to the biological questions. Contrasts are numbered. These labels will follow for all analysis.

In our example we are studying the effect of temperature on seed germination (transition from the state of dormancy to the state of germination vigor). For that we select 3 average contrasts (see image):

This statistical framework will be applied on all loaded datasets.

3.3 Single omics data analysis

Once the contrasts are chosen, 3 Item menus appear in the side bar menu that correspond to the 3 loaded omics types.

For each dataset, different steps of data analysis are proposed as a tab panel. These analysis steps must be performed sequentially. It is possible to switch between datasets. However, you cannot perform a new task until the previous one is completed. A progress bar is displayed indicating the progress status.

3.3.1 Data processing and quality check

In this tab panel we access a data quality control screen and we perform the appropriate data filtering and processing according to the type of omics.

By default, features with 0 count in all conditions are removed from the data.

Specifically for each data we have the possibility to exclude samples or to select a subset of samples to analyze, provided the completeness condition is met.

3.3.1.1 Completeness check

The design must be complete and as possible balanced. A Complete design (mandatory) mean that all possible combinations of biological factors’ levels (called groups) are presents. A Balanced design (recommended) has an equal number of observations (replicates) for each group.

These constraints are checked and represented as graph (each square represent 1 group and each color level indicate the number of observations).

The effect of data filtering or sample removing step can be explored thanks to many QC graphs. For each QC, a graph is built on each raw data and filtered data to allow the comparison.

3.3.1.2 Transcriptomics data

For RNAseq data, low expressed genes are filtered so that only genes whose cpm expression is greater than “CPM cutoff” in x samples are kept. The value of x is given by “Filtering strategy” (NbOfsample_over_cpm <= NbConditions). By default the data are filtered based on NbCondition strategy with CPM cutoff equal to 5. In this example, there are 9 conditions. The genes with a CPM less than 5 in at least 9 samples are removed.

To correct for differences in sequencing depths (library size), RFLOMICS proposes one method for normalization, “TMM method” from the edgeR package.

For RNAseq data the QC graphs are drawn before and after the filtering and normalization steps. The following graphs are displayed:

  • The log2(count) distribution per sample: Expect aligned medians. Samples with shifted median may be outliers.
  • The library size distribution which is the distribution of the total number of reads per sample.
  • The density distribution of log2(count) per sample: Expect a unimodal gaussian density distribution.
  • An interactive Principal Component Analysis allowing to color samples by any factor of the design to decode the main sources of variation in the data.

3.3.2 Differential expression analysis

In this tab panel, we can run the differential expression analysis. The analysis is run by contrast.
For RNAseq data, the glmFit() function from the edgeR package is used whereas for proteomics and metabolomics data, the lmFit() function from limma packages is used.

Results will appear in two components: a scrolling menu will give the results of the analysis by contrast and, at the bottom of the panel, the intersecting sets of the lists of DEG is given thanks to the UpSetR() function.

Each results has to be validated:

  • You have to to check the Pvalue distribution for each contrast and validate it (ok checkbox).

    • If the distribution is uniform: it is ok. A few genes are DE and the FDR will find them.
    • If the distribution as a peak near 0 and then is uniform: it is also ok. The higher the peak, the higher the number of DE genes. FDR correction can be applied.
    • If the distribution has 2 peaks one near 0 and one near 1 or just one peak near 1, it is not a good news! You have to understand what’s happened.
  • You can filter differentially expressed genes/proteins/metabolites either by fold change (|FC|) or by False Discovery Rate (BH) cut-off. Graphs will be automatically updated.

  • You have to validate to save the cut-off thresholds but also the contrasts result selection and pass to the co-expression analysis or to the annotation enrichment step.

3.3.3 Coexpression analysis

  • In this tab panel, choice is given to merge (union) or intersect
    (intersection) the lists of differentially expressed genes/proteins/metabolites associated to contrasts.

  • Co-expression analysis is done using to the coseq package with a set of optimal parameters.

    • For RNAseq, data transformation method is set to the arcsin function and normalization method to TMM. No scaling is done for these data. gaussian mixture model is used to decipher the different profiles of gene expression with a set of parameters to estimate fixed to (Gaussian_pK_Lk_Ck).
    • For proteomics/metabolomics data, scaling is done on proteins/metabolites. Another sets of parameters estimation is also proposed for the modeling of gene expression profile: (Gaussian_pK_Lk_Bk). It has to be used, if the other one doesn’t fit.

The number of technical replicates to perform (iter) for each K which is the number of expressions’ profile into which the DEG have to be cut, can be first set to 2 to precise the range of K and then put to a minimum of 20 replicates per K.

  • Results description (see CoSeq)

    • Results can be explored with several plots proposed by coseq. The ICL graph has been slightly modified to show all the range of the ICL values for a given K.

    • A table of jobs summary is also given. It groups error messages by K.

3.3.4 Annotation and Enrichment analysis using clusterProfiler

We select the lists of DE genes or clusters to annotate. All available lists are selected by default.

The enrichment analysis is performed using the clusterProfiler R-package. For more information on the methods and the package, please see the article, Bioconductor Package and the clusterProfiler Vignette.

You can chose between several domains, depending on several parameters:

  • custom: preferred option. It requires an annotation file (tabulated, at least two columns, one for the name of the gene, the other for the name or id of the annotation term). A second panel will allow the user to indicate which columns indicates what. You can have as many domains/ontology as you want, you just need to indicate the columns name that differentiate them in the right area;
  • GO (or GO:BP, GO:CC, GO:MF if only one GO domain is of interest). The user will have to chose a database library (org.*.db) installed on the libPath and the keytype of the entities (identifier to make the connection with the annotation).
  • KEGG: In the setting panel, enter the three letters to identify the organism and the keytype of the entities. It requires to have an online connection.

Once everything is set and the analysis is conducted, several panels are displayed to show the results.

Results of the analysis for multiple lists

3.3.5 Dataset analysis summary

This panel summarizes the effects of the filtering steps on the selected datasets and the results of the differential analyses. The first row shows how many samples and entities are left in each table. The integration will be performed on these datasets. The second row shows the number of DE entities for each selected contrast and for each table, with the distinction between up-regulated and down-regulated entities.

3.4 Integration

Once at least two tables have passed the pre-processing step, the Integration tab will appear in the side bar menu. Two sub menus are available. For each of them, you will have to run a preprocessing before accessing to the integration panel: select the datasets you want to integrate together and the filtering strategy for each data. A filtering is strongly advised when having large data or unbalanced number of features between them with the goals to homogenize the number of features between the datasets.

You can filter according to the DE entities found in the differential analysis, or based on the variation coefficient, taking a certain number of variables that have the largest coefficient. For this example, we chose to keep only 500 features from the RNAseq data, based on the variation coefficient.

RNAseq data will be transformed before the integration. This is done with the voom transformation from the limma R package.

Batch effects are removed before applying the function, using removebatcheffect from limma.

Features are then centered before the integration.

3.4.1 MOFA

This integration analysis is performed using the MOFA2 R-package. It is an unsupervised method that searches for latent factors to decompose the variance of the data, similarly to a PCA, but applied to a multi-dataset situation. The interpretation of the results is also very similar to those of a PCA: the most interesting factors are the one explaining more variance, whether it’s on several dataset or just one table. It also gives weights to the features of each table, for each factor. This is interesting in case a factor can be a posteriori linked to a covariate (in the case displayed on the figure, the temperature seems linked to the first factor, the features used to build this factor are to be considered very relevant in the analysis).

The user can chose to scale or not each tables, with the scale views argument, then set the number of factors to be computed and finally the maximum iterations for the MOFA run. We recommend that the user do not change the number of iterations. Once everything is set, you can run the analysis by clicking on the run button.

Several figure are provided by the interface once the analysis is done.

3.4.2 MixOmics

This integration analysis is performed using the MixOmics R-package. In RFLOMICS, we focused on the supervised analyses using the functions block.(s)plda (also called DIABLO) from the package. Only complete cases will be used for the analysis. The user can select all biological factors as response variables. The meta and batch factors are not available.

Using Sparse Analysis will set the “s” in the function names. It this case, the tuning ‘cases’ argument will be considered and passed to tune.splsda or tune.block.splsda: it determines the number of features selection to be tested.

In RFLOMICS, we chose to only ask for a number of cases. This number of cases is used to determine the keep.X argument, using a span of nfeatures/number of cases. By example, if the user chose 5 cases and has 2 datasets with respectively 500 and 250 features, the function will test 25 combinations of features per component. For each datasets, features will be picked-up in a cumulative manner ie for the DS1: 100,200,300,400,500 and DS2: 50, 100, 150, 200, 250. An example of combination will be c(DS1=100,DS2=50). The best combination of results by components will be used by mixOmics block.splsda.

Once everything is set, you can run the analysis by clicking on the run button. Several figures are provided by the interface once the analysis is done, for each selected response variable.

4 Exploring results from RFLOMICS

Weather the results are obtained as a .RData object downloaded from the interface or directly using the command line functions, the structure of the results is the same and visualization functions (from RFLOMICS, MultiAssayExperiment or SummarizedExperiment) can be used to further investigate the analyses results and produce suitable figures for an article. Most visualization functions from RFLOMICS return a ggplot object and can be modified using the ggplot2 functions.

Each of the analysis step has their own plot and getters methods.

5 Using RFLOMICS without the interface

All the pipeline, results and figures produced in the interface can be obtained by running methods directly in the R console.

5.1 Load example data

Ecoseed data can be accessed using the following command:

data("ecoseed.mae")

5.2 Create a RflomicsMAE object

The first step is to create the object containing all the data that can be used by RFLOMICS. This requires a project name, the datasets to be analyzed, a design matrix and information for each factor: the type (biological, batch or meta-factor) and a reference level, all in the form of data.frames. The reference level must be set with a particular care, as it will be used to define all contrasts in the differential analysis.

factorInfo <- data.frame(
  "factorName"   = c("Repeat", "temperature", "imbibition"),
  "factorType"   = c("batch", "Bio", "Bio")
)
testObject <- createRflomicsMAE(
  projectName = "VignetteProject",
  omicsData   = ecoseed.mae,
  omicsTypes  = c("RNAseq","proteomics","metabolomics"),
  factorInfo  = factorInfo
)
class(testObject)
#> [1] "RflomicsMAE"
#> attr(,"package")
#> [1] "RFLOMICS"

RFLOMICS uses a structure derived from the MultiAssayExperiment package. Each dataset is stored as a RflomicsSE (derived from SummarizedExperiment) in a RflomicsMAE object. All methods from the MultiAssayExperiment package are also available for RFLOMICS objects.

is(testObject, "MultiAssayExperiment")
#> [1] TRUE
upsetSamples(testObject)

6 Session Info

sessionInfo()
#> R Under development (unstable) (2025-03-13 r87965)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>  [1] RFLOMICS_0.99.7             coseq_1.31.0               
#>  [3] knitr_1.50                  htmltools_0.5.8.1          
#>  [5] ggplot2_3.5.1               dplyr_1.1.4                
#>  [7] shinyBS_0.61.1              MultiAssayExperiment_1.33.9
#>  [9] SummarizedExperiment_1.37.0 Biobase_2.67.0             
#> [11] GenomicRanges_1.59.1        GenomeInfoDb_1.43.4        
#> [13] IRanges_2.41.3              S4Vectors_0.45.4           
#> [15] BiocGenerics_0.53.6         generics_0.1.3             
#> [17] MatrixGenerics_1.19.1       matrixStats_1.5.0          
#> [19] BiocStyle_2.35.0           
#> 
#> loaded via a namespace (and not attached):
#>   [1] fs_1.6.5                enrichplot_1.27.4       httr_1.4.7             
#>   [4] RColorBrewer_1.1-3      doParallel_1.0.17       tools_4.6.0            
#>   [7] backports_1.5.0         R6_2.6.1                DT_0.33                
#>  [10] HDF5Array_1.35.16       lazyeval_0.2.2          uwot_0.2.3             
#>  [13] rhdf5filters_1.19.2     GetoptLong_1.0.5        withr_3.0.2            
#>  [16] gridExtra_2.3           bayesm_3.1-6            cli_3.6.4              
#>  [19] Cairo_1.6-2             flashClust_1.01-2       sandwich_3.1-1         
#>  [22] labeling_0.4.3          sass_0.4.9              mvtnorm_1.3-3          
#>  [25] robustbase_0.99-4-1     proxy_0.4-27            yulab.utils_0.2.0      
#>  [28] gson_0.1.0              DOSE_4.1.0              R.utils_2.13.0         
#>  [31] plotrix_3.8-4           limma_3.63.11           org.At.tair.db_3.21.0  
#>  [34] RSQLite_2.3.9           gridGraphics_0.5-1      shape_1.4.6.1          
#>  [37] crosstalk_1.2.1         vroom_1.6.5             car_3.1-3              
#>  [40] HTSCluster_2.0.11       GO.db_3.21.0            leaps_3.2              
#>  [43] Matrix_1.7-3            abind_1.4-8             R.methodsS3_1.8.2      
#>  [46] lifecycle_1.0.4         scatterplot3d_0.3-44    multcomp_1.4-28        
#>  [49] yaml_2.3.10             edgeR_4.5.10            carData_3.0-5          
#>  [52] rhdf5_2.51.2            qvalue_2.39.0           SparseArray_1.7.7      
#>  [55] Rtsne_0.17              grid_4.6.0              blob_1.2.4             
#>  [58] promises_1.3.2          shinydashboard_0.7.2    crayon_1.5.3           
#>  [61] dir.expiry_1.15.0       ggtangle_0.0.6          lattice_0.22-6         
#>  [64] cowplot_1.1.3           KEGGREST_1.47.0         HTSFilter_1.47.0       
#>  [67] magick_2.8.6            capushe_1.1.2           pillar_1.10.1          
#>  [70] ComplexHeatmap_2.23.1   fgsea_1.33.4            rjson_0.2.23           
#>  [73] estimability_1.5.1      corpcor_1.6.10          mixOmics_6.31.4        
#>  [76] codetools_0.2-20        fastmatch_1.1-6         glue_1.8.0             
#>  [79] ggfun_0.1.8             data.table_1.17.0       vctrs_0.6.5            
#>  [82] png_0.1-8               treeio_1.31.0           EnhancedVolcano_1.25.0 
#>  [85] gtable_0.3.6            cachem_1.1.0            xfun_0.51              
#>  [88] S4Arrays_1.7.3          mime_0.13               coda_0.19-4.1          
#>  [91] survival_3.8-3          pheatmap_1.0.12         iterators_1.0.14       
#>  [94] tinytex_0.56            statmod_1.5.0           TH.data_1.1-3          
#>  [97] nlme_3.1-168            ggtree_3.15.0           bit64_4.6.0-1          
#> [100] filelock_1.0.3          UpSetR_1.4.0            tensorA_0.36.2.1       
#> [103] bslib_0.9.0             colorspace_2.1-1        DBI_1.2.3              
#> [106] DESeq2_1.47.5           tidyselect_1.2.1        emmeans_1.11.0         
#> [109] bit_4.6.0               compiler_4.6.0          Rmixmod_2.1.10         
#> [112] compositions_2.0-8      h5mread_0.99.4          basilisk.utils_1.19.1  
#> [115] DelayedArray_0.33.6     bookdown_0.42           scales_1.3.0           
#> [118] DEoptimR_1.1-3-1        multcompView_0.1-10     stringr_1.5.1          
#> [121] digest_0.6.37           MOFA2_1.17.0            rmarkdown_2.29         
#> [124] basilisk_1.19.3         XVector_0.47.2          pkgconfig_2.0.3        
#> [127] FactoMineR_2.11         fastmap_1.2.0           rlang_1.1.5            
#> [130] GlobalOptions_0.1.2     htmlwidgets_1.6.4       UCSC.utils_1.3.1       
#> [133] shiny_1.10.0            farver_2.1.2            jquerylib_0.1.4        
#> [136] zoo_1.8-13              jsonlite_2.0.0          BiocParallel_1.41.5    
#> [139] GOSemSim_2.33.0         R.oo_1.27.0             magrittr_2.0.3         
#> [142] Formula_1.2-5           GenomeInfoDbData_1.2.14 ggplotify_0.1.2        
#> [145] patchwork_1.3.0         Rhdf5lib_1.29.2         munsell_0.5.1          
#> [148] Rcpp_1.0.14             ape_5.8-1               reticulate_1.42.0      
#> [151] stringi_1.8.7           MASS_7.3-65             plyr_1.8.9             
#> [154] parallel_4.6.0          ggrepel_0.9.6           forcats_1.0.0          
#> [157] Biostrings_2.75.4       splines_4.6.0           circlize_0.4.16        
#> [160] locfit_1.5-9.12         igraph_2.1.4            ggpubr_0.6.0           
#> [163] ggsignif_0.6.4          reshape2_1.4.4          evaluate_1.0.3         
#> [166] BiocManager_1.30.25     tzdb_0.5.0              foreach_1.5.2          
#> [169] httpuv_1.6.15           tidyr_1.3.1             purrr_1.0.4            
#> [172] clue_0.3-66             BiocBaseUtils_1.9.0     broom_1.0.8            
#> [175] xtable_1.8-4            e1071_1.7-16            RSpectra_0.16-2        
#> [178] tidytree_0.4.6          rstatix_0.7.2           later_1.4.1            
#> [181] class_7.3-23            rARPACK_0.11-0          tibble_3.2.1           
#> [184] clusterProfiler_4.15.1  aplot_0.2.5             memoise_2.0.1          
#> [187] AnnotationDbi_1.69.0    ellipse_0.5.0           cluster_2.1.8.1        
#> [190] corrplot_0.95           shinyWidgets_0.9.0