pycallingcards.preprocessing.combine_peaks#

pycallingcards.preprocessing.combine_peaks(peak_data, index, expdata=None, background=None, method='CCcaller', reference='hg38', test_method='poisson', lam_win_size=100000, pvalue_cutoff=None, pvalue_cutoffbg=None, pvalue_cutoffTTAA=None, pseudocounts=0.2, return_whole=False)[source]#

Combine two peaks.

This function combine the one and the next peak peaks.

Parameters:
  • peak_data (DataFrame) – pd.DataFrame for peak data. Please input the original data from call_peaks function.

  • index (int) – The index for the first peak to combine. Will combine peak index and peak index+1.

  • expdata (Optional[DataFrame] (default: None)) – pd.DataFrame with the first three columns as chromosome, start and end.

  • background (Optional[DataFrame] (default: None)) – pd.DataFrame with the first three columns as chromosome, start and end.

  • method (Optional[Literal['CCcaller', 'MACCs', 'Blockify']] (default: 'CCcaller')) – ‘CCcaller’ is a method considering the maxdistance between insertions in the data, ‘MACCs’ uses the idea adapted from [Zhang et al., 2008] and here. ‘Blockify’ uses the method from [Moudgil et al., 2020] and here.

  • reference (Optional[Literal['hg38', 'mm10', 'sacCer3']] (default: 'hg38')) – We currently have ‘hg38’ for human data, ‘mm10’ for mouse data and ‘sacCer3’ for yeast data.

  • pvalue_cutoff (Optional[float] (default: None)) – The P-value cutoff for a backgound free situation. If None, no filteration.

  • pvalue_cutoffbg (Optional[float] (default: None)) – The P-value cutoff for backgound data when backgound exists. If None, no filteration.

  • pvalue_cutoffTTAA (Optional[float] (default: None)) – The P-value cutoff for reference data when backgound exists. Note that pvalue_cutoffTTAA is recommended to be lower than pvalue_cutoffbg. If None, no filteration.

  • pseudocounts (float (default: 0.2)) – Number for pseudocounts added for the pyhothesis.

  • return_whole (bool (default: False)) – If False, return only the combined peak. If True, return the whole peak dataframe.

Examples:

>>> import pycallingcards as cc
>>> qbed_data = cc.datasets.mousecortex_data(data="qbed")
>>> peak_data = cc.pp.call_peaks(qbed_data, method = "CCcaller", reference = "mm10",  maxbetween = 2000,pvalue_cutoff = 0.01, pseudocounts = 1, record = True)
>>> peak_data = cc.pp.combine_peaks(peak_data, 1, qbed_data, method = "CCcaller", reference = "mm10",  pvalue_cutoff = 0.01, pseudocounts = 1, return_whole = True)