Title: | Infer Ploidy of Single Cells |
---|---|
Description: | Compute ploidy of single cells (or nuclei) based on single-cell (or single-nucleus) ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) data <https://github.com/fumi-github/scPloidy>. |
Authors: | Fumihiko Takeuchi [aut, cre] |
Maintainer: | Fumihiko Takeuchi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.2 |
Built: | 2024-10-31 05:09:15 UTC |
Source: | https://github.com/fumi-github/scploidy |
Infer Copy Number Variations (CNVs) in Cancer Cells from ATAC-seq Fragment Overlap
cnv( fragmentoverlap, windowcovariates, levels = c(2, 4), nfragspercellmin = 5000, nfragspercellmax = 10^5.5, deltaBICthreshold = 0 )
cnv( fragmentoverlap, windowcovariates, levels = c(2, 4), nfragspercellmin = 5000, nfragspercellmax = 10^5.5, deltaBICthreshold = 0 )
fragmentoverlap |
Frequency of fragment overlap in each cell-window
computed by the function |
windowcovariates |
Chromosomal windows for which copy number
gain/loss are initially inferred. Required columns are chr, start, end,
window (for example, |
levels |
Possible values of ploidy. For example,
|
nfragspercellmin |
Minimum number of fragments for a cell-window to be eligible. |
nfragspercellmax |
Maximum number of fragments for a cell-window to be eligible. |
deltaBICthreshold |
Only the CNVs with deltaBIC smaller than this threshold are adopted. |
A list with two elements.
CNV
is a data frame of the CNVs identified in the dataset.
cellwindowCN
is a data frame indicating the ploidy for each cell
and the inferred standardized copy number for each cell-window.
Count Overlap of ATAC-seq Fragments
fragmentoverlapcount( file, targetregions, excluderegions = NULL, targetbarcodes = NULL, Tn5offset = c(1, 0), barcodesuffix = NULL, dobptonext = FALSE )
fragmentoverlapcount( file, targetregions, excluderegions = NULL, targetbarcodes = NULL, Tn5offset = c(1, 0), barcodesuffix = NULL, dobptonext = FALSE )
file |
Filename of the file for ATAC-seq fragments.
The file must be block gzipped (using the |
targetregions |
GRanges object for the regions where overlaps are counted.
Usually all of the autosomes.
If there is memory problem, split a chromosome into smaller chunks,
for example by 10 Mb.
The function loads each element of |
excluderegions |
GRanges object for the regions to be excluded.
Simple repeats in the genome should be listed here,
because repeats can cause false overlaps.
A fragment is discarded if its 5' or 3' end is located in |
targetbarcodes |
Character vector for the barcodes of cells to be analyzed,
such as those passing quality control.
If |
Tn5offset |
Numeric vector of length two.
The enzyme for ATAC-seq is a homodimer of Tn5.
The transposition sites of two Tn5 proteins are 9 bp apart,
and the (representative) site of accessibility is in between.
If the start and end position of your input file is taken from BAM file,
set the paramater to |
barcodesuffix |
Add suffix to barcodes per targetregions. |
dobptonext |
(experimental feature) Whether to compute smoothed distance to the next fragment (irrelevant to BC) as bptonext, which is the inverse of chromatin accessibility, and append as 9th to 14th columns. |
A tibble with each row corresponding to a cell.
For each cell, its barcode, the total count of the fragments nfrag
,
and the count distinguished by overlap depth are given.
The dataset includes 788 nuclei obtained from
basal cell carcinoma sample SU008_Tumor_Pre.
Overlapping of single-nucleus ATAC-seq fragments was computed with the
fragmentoverlapcount
function.
data(GSE129785_SU008_Tumor_Pre) SU008_Tumor_Pre_windowcovariates rescnv
data(GSE129785_SU008_Tumor_Pre) SU008_Tumor_Pre_windowcovariates rescnv
SU008_Tumor_Pre_fragmentoverlap
is a dataframe of fragmentoverlap.
SU008_Tumor_Pre_windowcovariates
is a dataframe of windows and peaks.
rescnv
is a list containing the output of cnv function.
Satpathy et al. (2019) Nature Biotechnology 37:925 doi:10.1038/s41587-019-0206-z
## Not run: data(GSE129785_SU008_Tumor_Pre) levels = c(2, 4) result = cnv(SU008_Tumor_Pre_fragmentoverlap, SU008_Tumor_Pre_windowcovariates, levels = levels, deltaBICthreshold = -600) ## End(Not run)
## Not run: data(GSE129785_SU008_Tumor_Pre) levels = c(2, 4) result = cnv(SU008_Tumor_Pre_fragmentoverlap, SU008_Tumor_Pre_windowcovariates, levels = levels, deltaBICthreshold = -600) ## End(Not run)
Infer Ploidy from ATAC-seq Fragment Overlap
ploidy( fragmentoverlap, levels, s = 100, epsilon = 1e-08, subsamplesize = NULL, dobayes = FALSE, prop = 0.9 )
ploidy( fragmentoverlap, levels, s = 100, epsilon = 1e-08, subsamplesize = NULL, dobayes = FALSE, prop = 0.9 )
fragmentoverlap |
Frequency of fragment overlap in each cell
computed by the function |
levels |
Possible values of ploidy. For example,
|
s |
Seed for random numbers used in EM algorithm. |
epsilon |
Convergence criterion for the EM algorithm. |
subsamplesize |
EM algorithm becomes difficult to converge
when the number of cells is very large.
By setting the parameter (e.g. to 1e4),
we can run EM algorithm iteratively,
first for |
dobayes |
(experimental feature) Whether to perform Bayesian inference, which takes long computation time. |
prop |
Proportion of peaks that can be fitted with binomal distribution in ploidy.bayes. The rest of peaks are allowed to have depth larger than the ploidy. |
A data.frame with each row corresponding to a cell.
For each cell, its barcode, ploidy inferred by 1) moment method,
2) the same with additional K-means clustering,
3) EM algorithm of mixture, and, optionally,
4) Bayesian inference are given.
I recommend using ploidy.moment
or ploidy.em
.
When fragmentoverlapcount
was computed with dobptonext=TRUE
,
we only use the chromosomal sites with chromatin accessibility in top 10
This requires longer computation time.
The dataset includes 3572 nuclei obtained from the liver of
a 16 weeks old male rat, which was fed normal diet.
Overlapping of single-nucleus ATAC-seq fragments was computed with the
fragmentoverlapcount
function and saved as fragmentoverlap
.
The cell type of the nuclei are saved in the data.frame cells
.
The data for rat SHR_m154211 was taken from the publication cited below.
data(SHR_m154211)
data(SHR_m154211)
An object of class list
of length 2.
Takeuchi et al. (2022) bioRxiv doi:10.1101/2022.07.12.499681
data(SHR_m154211) fragmentoverlap = SHR_m154211$fragmentoverlap p = ploidy(fragmentoverlap, c(2, 4, 8)) head(p) cells = SHR_m154211$cells table(cells$celltype, p$ploidy.moment[match(cells$barcode, p$barcode)])
data(SHR_m154211) fragmentoverlap = SHR_m154211$fragmentoverlap p = ploidy(fragmentoverlap, c(2, 4, 8)) head(p) cells = SHR_m154211$cells table(cells$celltype, p$ploidy.moment[match(cells$barcode, p$barcode)])