pycallingcards.preprocessing.separate_peaks#
- pycallingcards.preprocessing.separate_peaks(peak_data, index, middle_start, middle_end, expdata=None, background=None, method='CCcaller', reference='hg38', test_method='poisson', lam_win_size=100000, pvalue_cutoff=None, pvalue_cutoffbg=None, pvalue_cutoffTTAA=None, pseudocounts=0.2, return_whole=False)[source]#
Separate two peaks.
This function separate one peak into two.
- Parameters:
peak_data (
DataFrame
) – pd.DataFrame for peak data. Please input the original data from call_peaks function.
- :param index
The index for the peak to separate.
- :param middle_start
The start point of the cutoff which is the end point of the first peak after separation.
- :param middle_end
TThe end point of the cutoff which is the start point of the second peak after separation.
- Parameters:
expdata (
Optional
[DataFrame
] (default:None
)) – pd.DataFrame with the first three columns as chromosome, start and end.background (
Optional
[DataFrame
] (default:None
)) – pd.DataFrame with the first three columns as chromosome, start and end.method (
Optional
[Literal
['CCcaller'
,'MACCs'
,'Blockify'
]] (default:'CCcaller'
)) – ‘CCcaller’ is a method considering the maxdistance between insertions in the data, ‘MACCs’ uses the idea adapted from [Zhang et al., 2008] and here. ‘Blockify’ uses the method from [Moudgil et al., 2020] and here.reference (
Optional
[Literal
['hg38'
,'mm10'
,'sacCer3'
]] (default:'hg38'
)) – We currently have ‘hg38’ for human data, ‘mm10’ for mouse data and ‘sacCer3’ for yeast data.pvalue_cutoff (
Optional
[float
] (default:None
)) – The P-value cutoff for a backgound free situation. If None, no filteration.pvalue_cutoffbg (
Optional
[float
] (default:None
)) – The P-value cutoff for backgound data when backgound exists. If None, no filteration.pvalue_cutoffTTAA (
Optional
[float
] (default:None
)) – The P-value cutoff for reference data when backgound exists. Note that pvalue_cutoffTTAA is recommended to be lower than pvalue_cutoffbg. If None, no filteration.pseudocounts (
float
(default:0.2
)) – Number for pseudocounts added for the pyhothesis.return_whole (
bool
(default:False
)) – If False, return only the combined peak. If True, return the whole peak dataframe.
- Examples:
>>> import pycallingcards as cc >>> qbed_data = cc.datasets.mousecortex_data(data="qbed") >>> peak_data = cc.pp.call_peaks(qbed_data, method = "CCcaller", reference = "mm10", maxbetween = 2000,pvalue_cutoff = 0.01, pseudocounts = 1, record = True) >>> cc.pp.separate_peaks(peak_data,1,4807673,4808049,expdata=qbed_data,reference='mm10',method = "CCcaller",test_method='poisson',pvalue_cutoff=0.01,pseudocounts=0.1,return_whole=False)