pycallingcards.preprocessing.call_peaks#
- pycallingcards.preprocessing.call_peaks(expdata, background=None, method='CCcaller', reference='hg38', pvalue_cutoff=0.0001, pvalue_cutoffbg=0.0001, pvalue_cutoffTTAA=1e-05, pvalue_adj_cutoff=None, min_insertions=5, minlen=0, extend=200, maxbetween=2000, minnum=0, test_method='poisson', window_size=1500, lam_win_size=100000, step_size=500, pseudocounts=0.2, min_length=None, max_length=None, record=True, save=None)[source]#
Call peaks from qbed data.
- Parameters:
expdata (
DataFrame
) – pd.DataFrame with the first three columns as chromosome, start and end.background (
Optional
[DataFrame
] (default:None
)) – Default is None for backgound free situation. pd.DataFrame with the first three columns as chromosome, start and end.method (
Optional
[Literal
['CCcaller'
,'MACCs'
,'Blockify'
]] (default:'CCcaller'
)) – ‘CCcaller’ is a method considering the maxdistance between insertions in the data, ‘MACCs’ uses the idea adapted from [Zhang et al., 2008] and here. ‘Blockify’ uses the method from [Moudgil et al., 2020] and here.reference (
Optional
[Literal
['hg38'
,'mm10'
,'sacCer3'
]] (default:'hg38'
)) – We currently have ‘hg38’ for human data, ‘mm10’ for mouse data and ‘sacCer3’ for yeast data.pvalue_cutoff (
float
(default:0.0001
)) – The P-value cutoff for a backgound free situation.pvalue_cutoffbg (
float
(default:0.0001
)) – The P-value cutoff for backgound data when backgound exists.pvalue_cutoffTTAA (
float
(default:1e-05
)) – The P-value cutoff for reference data when backgound exists. Note that pvalue_cutoffTTAA is recommended to be lower than pvalue_cutoffbg.pvalue_adj_cutoff (
Optional
[float
] (default:None
)) – The cutoff for the adjusted pvalue. If None, no adjusted pvalue will be the same is pvalue_cutoff (for backgound free) or pvalue_cutoffTTAA (for with backgound) .min_insertions (
int
(default:5
)) – The number of minimal insertions for each peak.minlen (
int
(default:0
)) – Valid only for method = ‘CCcaller’. The minimal length for a peak without extend.extend (
int
(default:200
)) – Valid for method = ‘CCcaller’ and ‘MACCs’. The length (bp) that peaks extend for both sides.maxbetween (
int
(default:2000
)) – Valid only for method = ‘CCcaller’. The maximum length of nearby position within one peak.minnum (
int
(default:0
)) – Valid only for method = ‘CCcaller’. The minmum number of insertions for the nearby position.test_method (
Optional
[Literal
['poisson'
,'binomial'
]] (default:'poisson'
)) – The method for making hypothesis.window_size (
int
(default:1500
)) – Valid only for method = ‘MACCs’. The length of window looking for.lam_win_size (
Optional
[int
] (default:100000
)) – Valid for method = ‘CCcaller’ and ‘MACCs’. The length of peak area considered when performing a CCcaller.step_size (
int
(default:500
)) – Valid only for ‘MACCs’. The length of each step.pseudocounts (
float
(default:0.2
)) – Number for pseudocounts added for the pyhothesis.min_length (
Optional
[int
] (default:None
)) – minimum length of peak, valid for Blockify.max_length (
Optional
[int
] (default:None
)) – maximum length of peak, valid for Blockify.record (
bool
(default:True
)) – Controls if information is recorded. If False, the output would only have three columns: Chromosome, Start, End.save (
Optional
[str
] (default:None
)) – The file name for the file we saved.
- Returns:
- Chr - The chromosome of the peak.Start - The start point of the peak.End - The end point of the peak.Experiment Insertions - The total number of insertions within a peak in the experiment data.Reference Insertions - The total number of insertions of within a peak in the reference data.Background insertions - The total number of insertions within a peak in the experiment data.Expected Insertions - The total number of expected insertions under null hypothesis from the reference data (in a background free situation).Expected Insertions background - The total number of expected insertions under null hypothesis from the background data (in a background situation).Expected Insertions Reference - The total number of expected insertions under null hypothesis from the reference data (in a background situation).pvalue - The pvalue we calculate from null hypothesis (in a background free situation or method = ‘Blockify’).pvalue Reference - The total number of insertions of within a peak in the reference data (in a background situation).pvalue Background - The total number of insertions of within a peak in the reference data (in a background situation).Fraction Experiment - The fraction of insertions in the experiment data.TPH Experiment - Transpositions per hundred million insertions in the experiment data for mammalian and transpositions per hundred million insertions in the experiment data for sacCer3.Fraction Background - The fraction of insertions in the background data.TPH Background - Transpositions per hundred million insertions in the background data for mammalian and transpositions per hundred million insertions in the background data for sacCer3.TPH Background subtracted - The difference between TPH Experiment and TPH Background.
- Return type:
- Examples:
>>> import pycallingcards as cc >>> qbed_data = cc.datasets.mousecortex_data(data="qbed") >>> peak_data = cc.pp.call_peaks(qbed_data, method = "CCcaller", reference = "mm10", maxbetween = 2000,pvalue_cutoff = 0.01, pseudocounts = 1, record = True)