| Title: | Infer Ploidy of Single Cells |
|---|---|
| Description: | Compute ploidy of single cells (or nuclei) based on single-cell (or single-nucleus) ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data <https://github.com/fumi-github/scPloidy>. |
| Authors: | Fumihiko Takeuchi [aut, cre] (ORCID: <https://orcid.org/0000-0003-3185-5661>) |
| Maintainer: | Fumihiko Takeuchi <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.6.2 |
| Built: | 2026-06-08 07:41:50 UTC |
| Source: | https://github.com/fumi-github/scploidy |
Infer Copy Number Variations (CNVs) in Cancer Cells from ATAC-seq Fragment Overlap
cnv( fragmentoverlap, windowcovariates, levels = c(2, 4), nfragspercellmin = 5000, nfragspercellmax = 10^5.5, deltaBICthreshold = 0 )cnv( fragmentoverlap, windowcovariates, levels = c(2, 4), nfragspercellmin = 5000, nfragspercellmax = 10^5.5, deltaBICthreshold = 0 )
fragmentoverlap |
Frequency of fragment overlap in each cell-window
computed by the function |
windowcovariates |
Chromosomal windows for which copy number
gain/loss are initially inferred. Required columns are chr, start, end,
window (for example, |
levels |
Possible values of ploidy. For example,
|
nfragspercellmin |
Minimum number of fragments for a cell-window to be eligible. |
nfragspercellmax |
Maximum number of fragments for a cell-window to be eligible. |
deltaBICthreshold |
Only the CNVs with deltaBIC smaller than this threshold are adopted. |
A list with two elements.
CNV is a data frame of the CNVs identified in the dataset.
cellwindowCN is a data frame indicating the ploidy for each cell
and the inferred standardized copy number for each cell-window.
Count Overlap of ATAC-seq Fragments
fragmentoverlapcount( file, targetregions, excluderegions = NULL, targetbarcodes = NULL, Tn5offset = c(1, 0), barcodesuffix = NULL, dobptonext = FALSE )fragmentoverlapcount( file, targetregions, excluderegions = NULL, targetbarcodes = NULL, Tn5offset = c(1, 0), barcodesuffix = NULL, dobptonext = FALSE )
file |
Filename of the file for ATAC-seq fragments.
The file must be block gzipped (using the |
targetregions |
GRanges object for the regions where overlaps are counted.
Usually all of the autosomes.
If there is memory problem, split a chromosome into smaller chunks,
for example by 10 Mb.
The function loads each element of |
excluderegions |
GRanges object for the regions to be excluded.
Simple repeats in the genome should be listed here,
because repeats can cause false overlaps.
A fragment is discarded if its 5' or 3' end is located in |
targetbarcodes |
Character vector for the barcodes of cells to be analyzed,
such as those passing quality control.
If |
Tn5offset |
Numeric vector of length two.
The enzyme for ATAC-seq is a homodimer of Tn5.
The transposition sites of two Tn5 proteins are 9 bp apart,
and the (representative) site of accessibility is in between.
If the start and end position of your input file is taken from BAM file,
set the paramater to |
barcodesuffix |
Add suffix to barcodes per targetregions. |
dobptonext |
(experimental feature) Whether to compute smoothed distance to the next fragment (irrelevant to BC) as bptonext, which is the inverse of chromatin accessibility, and append as 9th to 14th columns. |
A tibble with each row corresponding to a cell.
For each cell, its barcode, the total count of the fragments nfrag,
and the count distinguished by overlap depth are given.
The dataset includes 788 nuclei obtained from
basal cell carcinoma sample SU008_Tumor_Pre.
Overlapping of single-nucleus ATAC-seq fragments was computed with the
fragmentoverlapcount function.
data(GSE129785_SU008_Tumor_Pre) SU008_Tumor_Pre_windowcovariates rescnvdata(GSE129785_SU008_Tumor_Pre) SU008_Tumor_Pre_windowcovariates rescnv
SU008_Tumor_Pre_fragmentoverlap is a dataframe of fragmentoverlap.
SU008_Tumor_Pre_windowcovariates is a dataframe of windows and peaks.
rescnv is a list containing the output of cnv function.
Satpathy et al. (2019) Nature Biotechnology 37:925 doi:10.1038/s41587-019-0206-z
## Not run: data(GSE129785_SU008_Tumor_Pre) levels = c(2, 4) result = cnv(SU008_Tumor_Pre_fragmentoverlap, SU008_Tumor_Pre_windowcovariates, levels = levels, deltaBICthreshold = -600) ## End(Not run)## Not run: data(GSE129785_SU008_Tumor_Pre) levels = c(2, 4) result = cnv(SU008_Tumor_Pre_fragmentoverlap, SU008_Tumor_Pre_windowcovariates, levels = levels, deltaBICthreshold = -600) ## End(Not run)
Infer Ploidy from ATAC-seq Fragment Overlap
ploidy( fragmentoverlap, levels, s = 100, epsilon = 1e-08, subsamplesize = NULL, dobayes = FALSE, prop = 0.9 )ploidy( fragmentoverlap, levels, s = 100, epsilon = 1e-08, subsamplesize = NULL, dobayes = FALSE, prop = 0.9 )
fragmentoverlap |
Frequency of fragment overlap in each cell
computed by the function |
levels |
Possible values of ploidy. For example,
|
s |
Seed for random numbers used in EM algorithm. |
epsilon |
Convergence criterion for the EM algorithm. |
subsamplesize |
EM algorithm becomes difficult to converge
when the number of cells is very large.
By setting the parameter (e.g. to 1e4),
we can run EM algorithm iteratively,
first for |
dobayes |
(experimental feature) Whether to perform Bayesian inference, which takes long computation time. |
prop |
Proportion of peaks that can be fitted with binomal distribution in ploidy.bayes. The rest of peaks are allowed to have depth larger than the ploidy. |
A data.frame with each row corresponding to a cell.
For each cell, its barcode, ploidy inferred by 1) moment method,
2) the same with additional K-means clustering,
3) EM algorithm of mixture, and, optionally,
4) Bayesian inference are given.
I recommend using ploidy.moment or ploidy.em.
When fragmentoverlapcount was computed with dobptonext=TRUE,
we only use the chromosomal sites with chromatin accessibility in top 10
This requires longer computation time.
The dataset includes 3572 nuclei obtained from the liver of
a 16 weeks old male rat, which was fed normal diet.
Overlapping of single-nucleus ATAC-seq fragments was computed with the
fragmentoverlapcount function and saved as fragmentoverlap.
The cell type of the nuclei are saved in the data.frame cells.
The data for rat SHR_m154211 was taken from the publication cited below.
data(SHR_m154211)data(SHR_m154211)
An object of class list of length 2.
Takeuchi et al. (2022) bioRxiv doi:10.1101/2022.07.12.499681
data(SHR_m154211) fragmentoverlap = SHR_m154211$fragmentoverlap p = ploidy(fragmentoverlap, c(2, 4, 8)) head(p) cells = SHR_m154211$cells table(cells$celltype, p$ploidy.moment[match(cells$barcode, p$barcode)])data(SHR_m154211) fragmentoverlap = SHR_m154211$fragmentoverlap p = ploidy(fragmentoverlap, c(2, 4, 8)) head(p) cells = SHR_m154211$cells table(cells$celltype, p$ploidy.moment[match(cells$barcode, p$barcode)])