1 The `scpdata` package

scpdata disseminates mass spectrometry (MS)-based single-cell proteomics (SCP) data sets formatted using the scp data structure. The data structure is described in the scp vignette.

In this vignette, we describe how to access the SCP data sets. To start, we load the scpdata package.

library("scpdata")

2 Load data from `ExperimentHub`

The data is stored using the ExperimentHub infrastructure. We first create a connection with ExperimentHub.

eh <- ExperimentHub()

You can list all data sets available in scpdata using the query function.

query(eh, "scpdata")
#> ExperimentHub with 21 records
#> # snapshotDate(): 2023-10-24
#> # $dataprovider: MassIVE, PRIDE, SlavovLab website
#> # $species: Homo sapiens, Mus musculus, Rattus norvegicus, Gallus gallus
#> # $rdataclass: QFeatures
#> # additional mcols(): taxonomyid, genome, description,
#> #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> #   rdatapath, sourceurl, sourcetype 
#> # retrieve records with, e.g., 'object[["EH3899"]]' 
#> 
#>            title             
#>   EH3899 | specht2019v2      
#>   EH3900 | specht2019v3      
#>   EH3901 | dou2019_lysates   
#>   EH3902 | dou2019_mouse     
#>   EH3903 | dou2019_boosting  
#>   ...      ...               
#>   EH7713 | brunner2022       
#>   EH8301 | leduc2022_pSCoPE  
#>   EH8302 | leduc2022_plexDIA 
#>   EH8303 | woo2022_macrophage
#>   EH8304 | woo2022_lung

Another way to get information about the available data sets is to call scpdata(). This will retrieve all the available metadata. For example, we can retrieve the data set titles along with the description to make an informed choice about which data set to choose.

info <- scpdata()
knitr::kable(info[, c("title", "description")])

	title	description
EH3899	specht2019v2	SCP expression data for monocytes (U-937) and macrophages at PSM, peptide and protein level
EH3900	specht2019v3	SCP expression data for more monocytes (U-937) and macrophages at PSM, peptide and protein level
EH3901	dou2019_lysates	SCP expression data for Hela digests (0.2 or 10 ng) at PSM and protein level
EH3902	dou2019_mouse	SCP expression data for C10, SVEC or Raw cells at PSM and protein level
EH3903	dou2019_boosting	SCP expression data for C10, SVEC or Raw cells and 3 boosters (0, 5 or 50 ng) at PSM and protein level
EH3904	zhu2018MCP	Near SCP expression data for micro-dissection rat brain samples (50, 100, or 200 µm width) at PSM level
EH3905	zhu2018NC_hela	Near SCP expression data for HeLa samples (aproximately 12, 40, or 140 cells) at PSM level
EH3906	zhu2018NC_lysates	Near SCP expression data for HeLa lysates (10, 40 and 140 cell equivalent) at PSM level
EH3907	zhu2018NC_islets	Near SCP expression data for micro-dissected human pancreas samples (control patients or type 1 diabetes) at PSM level
EH3908	cong2020AC	SCP expression data for Hela cells at PSM, peptide and protein level
EH3909	zhu2019EL	SCP expression data for chicken utricle samples (1, 3, 5 or 20 cells) at PSM, peptide and protein level
EH6011	liang2020_hela	Expression data for HeLa cells (0, 1, 10, 150, 500 cells) at PSM, peptide and protein level
EH7085	schoof2021	Single-cell proteomics data from OCI-AML8227 cell culture to reconstruct the cellular hierarchy.
EH7295	williams2020_lfq	Single-cell label free proteomics data from a MCF10A cell line culture.
EH7296	williams2020_tmt	Single-cell proteomics data from three acute myeloid leukemia cell line culture (MOLM-14, K562, CMK).
EH7712	derks2022	Single-cell and bulk (100-cell) proteomics data of PDAC, melanoma cells and monocytes.
EH7713	brunner2022	Single-cell proteomics data of cell cycle stages in HeLa.
EH8301	leduc2022_pSCoPE	Single-cell proteomics data of 878 melanoma cells and 877 monocytes (pSCoPE).
EH8302	leduc2022_plexDIA	Single-cell proteomics data of 126 melanoma cells (plexDIA).
EH8303	woo2022_macrophage	Single-cell proteomics data from LPS-treated macrophages.
EH8304	woo2022_lung	Single-cell proteomics data from primary human lung cells.

To get one of the data sets (e.g. dou2019_lysates) you can either retrieve it using the ExperimentHub query function

scp <- eh[["EH3901"]]
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> loading from cache
scp
#> An instance of class QFeatures containing 4 assays:
#>  [1] Hela_run_1: SingleCellExperiment with 24562 rows and 10 columns 
#>  [2] Hela_run_2: SingleCellExperiment with 24310 rows and 10 columns 
#>  [3] peptides: SingleCellExperiment with 13934 rows and 20 columns 
#>  [4] proteins: SingleCellExperiment with 1641 rows and 20 columns

or you can the use the built-in functions from scpdata

scp <- dou2019_lysates()
#> see ?scpdata and browseVignettes('scpdata') for documentation
#> loading from cache
scp
#> An instance of class QFeatures containing 4 assays:
#>  [1] Hela_run_1: SingleCellExperiment with 24562 rows and 10 columns 
#>  [2] Hela_run_2: SingleCellExperiment with 24310 rows and 10 columns 
#>  [3] peptides: SingleCellExperiment with 13934 rows and 20 columns 
#>  [4] proteins: SingleCellExperiment with 1641 rows and 20 columns

3 Data sets information

Each data set has been extensively documented in a separate man page (e.g. ?dou2019_lysates). You can find information about the data content, the acquisition protocol, the data collection procedure as well as the data sources and reference.

4 Data manipulation

For more information about manipulating the data sets, check the scp package. The scp vignette will guide you through a typical SCP data processing workflow. Once your data is loaded from scpdata you can skip section 2 Read in SCP data of the scp vignette.

Session information

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB              LC_COLLATE=C              
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/New_York
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] SingleCellExperiment_1.24.0 scpdata_1.10.0             
 [3] ExperimentHub_2.10.0        AnnotationHub_3.10.0       
 [5] BiocFileCache_2.10.0        dbplyr_2.3.4               
 [7] QFeatures_1.12.0            MultiAssayExperiment_1.28.0
 [9] SummarizedExperiment_1.32.0 Biobase_2.62.0             
[11] GenomicRanges_1.54.0        GenomeInfoDb_1.38.0        
[13] IRanges_2.36.0              S4Vectors_0.40.0           
[15] BiocGenerics_0.48.0         MatrixGenerics_1.14.0      
[17] matrixStats_1.0.0           BiocStyle_2.30.0           

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0              dplyr_1.1.3                  
 [3] blob_1.2.4                    Biostrings_2.70.1            
 [5] filelock_1.0.2                bitops_1.0-7                 
 [7] fastmap_1.1.1                 lazyeval_0.2.2               
 [9] RCurl_1.98-1.12               promises_1.2.1               
[11] digest_0.6.33                 mime_0.12                    
[13] lifecycle_1.0.3               cluster_2.1.4                
[15] ellipsis_0.3.2                ProtGenerics_1.34.0          
[17] KEGGREST_1.42.0               interactiveDisplayBase_1.40.0
[19] RSQLite_2.3.1                 magrittr_2.0.3               
[21] compiler_4.3.1                rlang_1.1.1                  
[23] sass_0.4.7                    tools_4.3.1                  
[25] igraph_1.5.1                  utf8_1.2.4                   
[27] yaml_2.3.7                    knitr_1.44                   
[29] S4Arrays_1.2.0                bit_4.0.5                    
[31] curl_5.1.0                    DelayedArray_0.28.0          
[33] abind_1.4-5                   withr_2.5.1                  
[35] purrr_1.0.2                   grid_4.3.1                   
[37] fansi_1.0.5                   xtable_1.8-4                 
[39] MASS_7.3-60                   cli_3.6.1                    
[41] rmarkdown_2.25                crayon_1.5.2                 
[43] generics_0.1.3                httr_1.4.7                   
[45] BiocBaseUtils_1.4.0           DBI_1.1.3                    
[47] cachem_1.0.8                  zlibbioc_1.48.0              
[49] AnnotationDbi_1.64.0          AnnotationFilter_1.26.0      
[51] BiocManager_1.30.22           XVector_0.42.0               
[53] vctrs_0.6.4                   Matrix_1.6-1.1               
[55] jsonlite_1.8.7                bookdown_0.36                
[57] bit64_4.0.5                   clue_0.3-65                  
[59] jquerylib_0.1.4               glue_1.6.2                   
[61] BiocVersion_3.18.0            later_1.3.1                  
[63] tibble_3.2.1                  pillar_1.9.0                 
[65] rappdirs_0.3.3                htmltools_0.5.6.1            
[67] GenomeInfoDbData_1.2.11       R6_2.5.1                     
[69] evaluate_0.22                 shiny_1.7.5.1                
[71] lattice_0.22-5                png_0.1-8                    
[73] memoise_2.0.1                 httpuv_1.6.12                
[75] bslib_0.5.1                   Rcpp_1.0.11                  
[77] SparseArray_1.2.0             xfun_0.40                    
[79] MsCoreUtils_1.14.0            pkgconfig_2.0.3

5 License

This vignette is distributed under a CC BY-SA license.

Single Cell Proteomics data sets.

26 October 2023

Package

1 The `scpdata` package

2 Load data from `ExperimentHub`

3 Data sets information

4 Data manipulation

Session information

5 License

Single Cell Proteomics data sets.

26 October 2023

Package

1 The scpdata package

2 Load data from ExperimentHub

3 Data sets information

4 Data manipulation

Session information

5 License

1 The `scpdata` package

2 Load data from `ExperimentHub`