pycallingcards.preprocessing.call_peaks¶
- pycallingcards.preprocessing.call_peaks(expdata, background=None, method='CCcaller', reference='hg38', pvalue_cutoff=0.0001, pvalue_cutoffbg=0.0001, pvalue_cutoffTTAA=1e-05, pvalue_adj_cutoff=None, min_insertions=5, minlen=0, extend=200, maxbetween=2000, minnum=0, test_method='poisson', window_size=1500, lam_win_size=100000, step_size=500, pseudocounts=0.2, min_length=None, max_length=None, record=True, save=None)[source]¶
Call peaks from qbed data.
- Parameters:
expdata (
DataFrame) – pd.DataFrame with the first three columns as chromosome, start and end.background (
Optional[DataFrame] (default:None)) – Default is None for backgound free situation. pd.DataFrame with the first three columns as chromosome, start and end.method (
Optional[Literal['CCcaller','MACCs','Blockify']] (default:'CCcaller')) – ‘CCcaller’ is a method considering the maxdistance between insertions in the data, ‘MACCs’ uses the idea adapted from [Zhang et al., 2008] and here. ‘Blockify’ uses the method from [Moudgil et al., 2020] and here.reference (
Optional[Literal['hg38','mm10','sacCer3']] (default:'hg38')) – We currently have ‘hg38’ for human data, ‘mm10’ for mouse data and ‘sacCer3’ for yeast data.pvalue_cutoff (
float(default:0.0001)) – The P-value cutoff for a backgound free situation.pvalue_cutoffbg (
float(default:0.0001)) – The P-value cutoff for backgound data when backgound exists.pvalue_cutoffTTAA (
float(default:1e-05)) – The P-value cutoff for reference data when backgound exists. Note that pvalue_cutoffTTAA is recommended to be lower than pvalue_cutoffbg.pvalue_adj_cutoff (
Optional[float] (default:None)) – The cutoff for the adjusted pvalue. If None, no adjusted pvalue will be the same is pvalue_cutoff (for backgound free) or pvalue_cutoffTTAA (for with backgound) .min_insertions (
int(default:5)) – The number of minimal insertions for each peak.minlen (
int(default:0)) – Valid only for method = ‘CCcaller’. The minimal length for a peak without extend.extend (
int(default:200)) – Valid for method = ‘CCcaller’ and ‘MACCs’. The length (bp) that peaks extend for both sides.maxbetween (
int(default:2000)) – Valid only for method = ‘CCcaller’. The maximum length of nearby position within one peak.minnum (
int(default:0)) – Valid only for method = ‘CCcaller’. The minmum number of insertions for the nearby position.test_method (
Optional[Literal['poisson','binomial']] (default:'poisson')) – The method for making hypothesis.window_size (
int(default:1500)) – Valid only for method = ‘MACCs’. The length of window looking for.lam_win_size (
Optional[int] (default:100000)) – Valid for method = ‘CCcaller’ and ‘MACCs’. The length of peak area considered when performing a CCcaller.step_size (
int(default:500)) – Valid only for ‘MACCs’. The length of each step.pseudocounts (
float(default:0.2)) – Number for pseudocounts added for the pyhothesis.min_length (
Optional[int] (default:None)) – minimum length of peak, valid for Blockify.max_length (
Optional[int] (default:None)) – maximum length of peak, valid for Blockify.record (
bool(default:True)) – Controls if information is recorded. If False, the output would only have three columns: Chromosome, Start, End.save (
Optional[str] (default:None)) – The file name for the file we saved.
- Returns:
- Chr - The chromosome of the peak.Start - The start point of the peak.End - The end point of the peak.Experiment Insertions - The total number of insertions within a peak in the experiment data.Reference Insertions - The total number of insertions of within a peak in the reference data.Background insertions - The total number of insertions within a peak in the experiment data.Expected Insertions - The total number of expected insertions under null hypothesis from the reference data (in a background free situation).Expected Insertions background - The total number of expected insertions under null hypothesis from the background data (in a background situation).Expected Insertions Reference - The total number of expected insertions under null hypothesis from the reference data (in a background situation).pvalue - The pvalue we calculate from null hypothesis (in a background free situation or method = ‘Blockify’).pvalue Reference - The total number of insertions of within a peak in the reference data (in a background situation).pvalue Background - The total number of insertions of within a peak in the reference data (in a background situation).Fraction Experiment - The fraction of insertions in the experiment data.TPH Experiment - Transpositions per hundred million insertions in the experiment data for mammalian and transpositions per hundred million insertions in the experiment data for sacCer3.Fraction Background - The fraction of insertions in the background data.TPH Background - Transpositions per hundred million insertions in the background data for mammalian and transpositions per hundred million insertions in the background data for sacCer3.TPH Background subtracted - The difference between TPH Experiment and TPH Background.
- Return type:
- Examples:
>>> import pycallingcards as cc >>> qbed_data = cc.datasets.mousecortex_data(data="qbed") >>> peak_data = cc.pp.call_peaks(qbed_data, method = "CCcaller", reference = "mm10", maxbetween = 2000,pvalue_cutoff = 0.01, pseudocounts = 1, record = True)