pycallingcards.preprocessing.separate_peaks#

pycallingcards.preprocessing.separate_peaks(peak_data, index, middle_start, middle_end, expdata=None, background=None, method='CCcaller', reference='hg38', test_method='poisson', lam_win_size=100000, pvalue_cutoff=None, pvalue_cutoffbg=None, pvalue_cutoffTTAA=None, pseudocounts=0.2, return_whole=False)[source]#

Separate two peaks.

This function separate one peak into two.

Parameters:

peak_data (DataFrame) – pd.DataFrame for peak data. Please input the original data from call_peaks function.

:param index

The index for the peak to separate.

:param middle_start

The start point of the cutoff which is the end point of the first peak after separation.

:param middle_end

TThe end point of the cutoff which is the start point of the second peak after separation.

Parameters:
  • expdata (Optional[DataFrame] (default: None)) – pd.DataFrame with the first three columns as chromosome, start and end.

  • background (Optional[DataFrame] (default: None)) – pd.DataFrame with the first three columns as chromosome, start and end.

  • method (Optional[Literal['CCcaller', 'MACCs', 'Blockify']] (default: 'CCcaller')) – ‘CCcaller’ is a method considering the maxdistance between insertions in the data, ‘MACCs’ uses the idea adapted from [Zhang et al., 2008] and here. ‘Blockify’ uses the method from [Moudgil et al., 2020] and here.

  • reference (Optional[Literal['hg38', 'mm10', 'sacCer3']] (default: 'hg38')) – We currently have ‘hg38’ for human data, ‘mm10’ for mouse data and ‘sacCer3’ for yeast data.

  • pvalue_cutoff (Optional[float] (default: None)) – The P-value cutoff for a backgound free situation. If None, no filteration.

  • pvalue_cutoffbg (Optional[float] (default: None)) – The P-value cutoff for backgound data when backgound exists. If None, no filteration.

  • pvalue_cutoffTTAA (Optional[float] (default: None)) – The P-value cutoff for reference data when backgound exists. Note that pvalue_cutoffTTAA is recommended to be lower than pvalue_cutoffbg. If None, no filteration.

  • pseudocounts (float (default: 0.2)) – Number for pseudocounts added for the pyhothesis.

  • return_whole (bool (default: False)) – If False, return only the combined peak. If True, return the whole peak dataframe.

Examples:

>>> import pycallingcards as cc
>>> qbed_data = cc.datasets.mousecortex_data(data="qbed")
>>> peak_data = cc.pp.call_peaks(qbed_data, method = "CCcaller", reference = "mm10",  maxbetween = 2000,pvalue_cutoff = 0.01, pseudocounts = 1, record = True)
>>> cc.pp.separate_peaks(peak_data,1,4807673,4808049,expdata=qbed_data,reference='mm10',method = "CCcaller",test_method='poisson',pvalue_cutoff=0.01,pseudocounts=0.1,return_whole=False)