Efficient Pre-Processing of Single-Cell Assay for Transposase Accessible Chromatin with High-Throughput Sequencing Data

The primary tool currently used to pre-process 10X chromium single-cell ATAC-seq data is Cell Ranger, which can take very long to run on standard datasets. To facilitate rapid pre-processing that enables reproducible workflows, we present a suite of tools called scATAK for pre-processing single-cell ATAC-seq data that is 15 to 18 times faster than Cell Ranger on mouse and human samples. Our tool can also calculate chromatin interaction potential matrices and generate open chromatin signal and interaction traces for cell groups. We use scATAK tool to explore the chromatin regulatory landscape of a healthy adult human brain and unveil cell-type specific features, and show that it provides a convenient and computational efficient approach for pre-processing single-cell ATAC-seq data.





References:
[1] Satpathy A. T., Granja J. M., Yost K. E., Qi Y., Meschi F., McDermott G. P., et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat Biotechnol 2019;37:925–36.
[2] Bray N. L., Pimentel H., Melsted P., Pachter L. Erratum: Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016;34:888.
[3] Melsted P., Sina Booeshaghi A., Gao F., Beltrame E., Lu L., Hjorleifsson KE, et al. Modular and efficient pre-processing of single-cell RNA-seq. Cold Spring Harbor Laboratory 2019:673285. https://doi.org/10.1101/673285.
[4] kb_python. Github; https://github.com/pachterlab/kb_python
[5] Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018;34:3094–100.
[6] Tarasov A., Vilella A. J., Cuppen E., Nijman I. J., Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 2015;31:2032–4.
[7] Gaspar J. M. Genrich. Github; https://github.com/jsh58/Genrich
[8] Quinlan A. R., Hall I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 2010;26:841–2.
[9] Heinz S., Benner C., Spann N., Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 2010;38:576–89.
[10] Wang C., Sun D., Huang X., Wan C., Li Z., Han Y., et al. Integrative analyses of single-cell transcriptome and regulome using MAESTRO. Genome Biol 2020;21:198.
[11] Butler A., Hoffman P., Smibert P., Papalexi E., Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol 2018;36:411–20.
[12] Fang R., Preissl S., Hou X., Lucero J., Wang X. Fast and accurate clustering of single cell epigenomes reveals cis-regulatory elements in rare cell types. BioRxiv 2019.
[13] Schep A. N., Wu B., Buenrostro J. D., Greenleaf W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods 2017;14:975–8.
[14] Lun A. T. L., Riesenfeld S., Andrews T., Dao T. P., Gomes T., participants in the 1st Human Cell Atlas Jamboree, et al. EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data. Genome Biol 2019;20:63.
[15] Lieberman-Aiden E., van Berkum N. L., Williams L., Imakaev M., Ragoczy T., Telling A., et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 2009;326:289–93.
[16] Mumbach M. R., Rubin A. J., Flynn R. A., Dai C., Khavari P. A., Greenleaf W. J., et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat Methods 2016;13:919–22.
[17] atac_v1_pbmc_5k -Datasets -Single Cell ATAC -Official 10x Genomics Support n.d. https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_pbmc_5k (accessed January 25, 2021).
[18] atac_v1_adult_brain_fresh_5k -Datasets -Single Cell ATAC -Official 10x Genomics Support n.d. https://support.10xgenomics.com/single-cell-atac/datasets/1.1.0/atac_v1_adult_brain_fresh_5k (accessed January 25, 2021).
[19] Köster J., Rahmann S.. Snakemake-a scalable bioinformatics workflow engine. Bioinformatics 2018;34:3600.
[20] Engel I., Murre C. The function of E- and Id proteins in lymphocyte development. Nat Rev Immunol 2001;1:193–9.
[21] Corces M. R., Shcherbina A., Kundu S., Gloudemans M. J., Frésard L., Granja J. M., et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson's diseases. Nat Genet 2020;52:1158–68.
[22] Mathys H., Davila-Velderrain J., Peng Z., Gao F., Mohammadi S., Young J. Z., et al. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature 2019;570:332–7.
[23] Zhou Y., Song W. M., Andhey P. S., Swain A., Levy T., Miller K. R., et al. Human and mouse single-nucleus transcriptomics reveal TREM2-dependent and TREM2-independent cellular responses in Alzheimer’s disease. Nat Med 2020;26:131–42.
[24] Jansen I. E., Savage J. E., Watanabe K., Bryois J., Williams D. M., Steinberg S., et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat Genet 2019;51:404–13.
[25] Servant N., Varoquaux N., Lajoie B. R., Viara E., Chen C-J., Vert J-P., et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol 2015;16:259.
[26] Buenrostro J. D., Wu B., Litzenburger U. M., Ruff D., Gonzales M. L., Snyder M. P., et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 2015;523:486–90.