## Description of datasets

## ------------------------ input VCF files ---------------------------

# 1. example VCF file 
# raw bam files come from: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/data/WES/
# Ref: Fang, L.T., Zhu, B., Zhao, Y. et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol 39, 1151–1160 (2021). https://doi.org/10.1038/s41587-021-00993-6
# generation process: bam file of sample WES_EA_1 -> run MuTect2 for variant calling -> VEP for annotation -> randomly select some mutations
Name: WES_EA_T_1_mutect2.vep.vcf

# 2. multi-caller VCF file
# raw bam files come from: https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/data/WES/
# Ref: Fang, L.T., Zhu, B., Zhao, Y. et al. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol 39, 1151–1160 (2021). https://doi.org/10.1038/s41587-021-00993-6
# generation process: bam file of sample WES_EA_1 -> run MuTect2 and VarScan for variant calling -> internal filter -> obtain filtered SNP -> VEP for annotation -> randomly select some mutations
Name: Multi-caller/WES_EA_T_1.MuSE.vep.vcf, Multi-caller/WES_EA_T_1_varscan_filter_snp.vep.vcf


## -------------------- Reference panel gene lists ---------------------

# 1. FoundationOne gene panel
# Ref url: https://info.foundationmedicine.com/hubfs/FMI%20Labels/FoundationOne_CDx_Label_Technical_Info.pdf
Name: Panel_gene/FoundationOne_genelist.txt


# 2. Pan-cancer gene panel
# Ref: Xu, Z., Dai, J., Wang, D., Lu, H., Dai, H., Ye, H., Gu, J., Chen, S., & Huang, B. (2019). Assessment of tumor mutation burden calculation from gene panel sequencing data. OncoTargets and therapy, 12, 3401–3409. https://doi.org/10.2147/OTT.S196638
Name: Panel_gene/Pan-cancer_genelist.txt


# 3. MSKCC gene panel
# Ref: Rizvi, H., Sanchez-Vega, F., La, K., Chatila, W., Jonsson, P., Halpenny, D., Plodkowski, A., Long, N., Sauter, J. L., Rekhtman, N., Hollmann, T., Schalper, K. A., Gainor, J. F., Shen, R., Ni, A., Arbour, K. C., Merghoub, T., Wolchok, J., Snyder, A., Chaft, J. E., … Hellmann, M. D. (2018). Molecular Determinants of Response to Anti-Programmed Cell Death (PD)-1 and Anti-Programmed Death-Ligand 1 (PD-L1) Blockade in Patients With Non-Small-Cell Lung Cancer Profiled With Targeted Next-Generation Sequencing. Journal of clinical oncology : official journal of the American Society of Clinical Oncology, 36(7), 633–641. https://doi.org/10.1200/JCO.2017.75.3384
Name: Panel_gene/MSK-IMPACT_gene_v1_341.txt, Panel_gene/MSK-IMPACT_gene_v2_410.txt, Panel_gene/MSK-IMPACT_gene_v3_468.txt


## -------------------- Reference panel bed regions ---------------------

# All bed files under *panel_hg19* and *panel_hg38* are generated from the UCSC Genome Browser based on the gene list of the corresponding [panels](https://github.com/likelet/CaMutQC/tree/master/inst/extdata/Panel_gene), and read in R, export as .rds files.
Name: bed/panel_hg19/FlCDx-hg19.rds, bed/panel_hg19/Pan-cancer-hg19.rds, bed/panel_hg19/MSK_v1-hg19.rds, bed/panel_hg19/MSK_v2-hg19.rds, bed/panel_hg19/MSK_v3-hg19.rds, bed/panel_hg38/FlCDx-hg38.rds, bed/panel_hg38/Pan-cancer-hg38.rds, bed/panel_hg38/MSK_v1-hg38.rds, bed/panel_hg38/MSK_v2-hg38.rds, bed/panel_hg38/MSK_v3-hg38.rds


## -------------------- Noncoding mutation types -----------------------
# Ref: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html
Name: noncoding.txt


## --------------------- Tumor suppressor gene list --------------------
# Ref: https://bioinfo.uth.edu/TSGene/download.cgi?csrt=1766158371923707334
Name: TSGenelist.txt

## ----------------- clinical data for toMesKit function -----------------
# generated randomly, only for the example of toMesKit function
Name: clin.txt

## --------------------- Panel-of-normal dataset -------------------------
# public PON dataset generated by GATK, randomly select some rows.
# Ref: https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON-
Name: PON_test.txt
