pycallingcards.tools.call_motif¶

pycallingcards.tools.call_motif(peaks_path=None, peaks_frame=None, save_name=None, reference='hg38', save_homer=None, size=1000, homer_path=None, motif_length=None, num_cores=3, denovo=False)[source]¶

Call motif by HOMER and [Heinz et al., 2010]. Please make sure HOMER is installed along with the genome data.

Parameters:

peaks_path (Optional[str] (default: None)) – pd.DataFrame with the path to the peak data. If this is provided, it will not consider peaks_frame and save_name.
peaks_frame (Optional[DataFrame] (default: None)) – pd.DataFrame with the first three columns as chromosome, start and end.
save_name (Optional[str] (default: None)) – The name of a saved peak file. Only used when peaks_frame is provided and peaks_path is not provided.
reference (Optional[Literal['hg38', 'mm10', 'sacCer3']] (default: 'hg38')) – reference of the annoatation data. Currently, only ‘hg38’, ‘mm10’, ‘sacCer3’ are provided. Make sure the genome in HOMER is installed. Eg for mm10: perl [path]/homer/.//configureHomer.pl -install mm10
save_homer (Optional[str] (default: None)) – Where path and name of the annotation results will be saved. If ‘None’ it will be saved to “Homerresult/peaks_name”
size (int (default: 1000)) – The size of the region for motif finding. This is one of the most important parameters and also a source of confusion for many. If you wish to find motifs using your peaks using their exact sizes, use the option “-size given”). However, for Transcription Factor peaks, most of the motifs are found +/- 50-75 bp from the peak center, making it better to use a fixed size rather than depending on your peak size.
homer_path (Optional[str] (default: None)) – The default uses the default path for Homer.
motif_length (Optional[int] (default: None)) – The default uses the default motif length for HOMER. Specifies the length of motifs to be found.
num_cores (int (default: 3)) – Number of CPUs to use.
deno – Whether to call denovo modif or not.

Examples:

>>> import pycallingcards as cc
>>> HCT116_SP1 = cc.datasets.SP1_K562HCT116_data(data="HCT116_SP1_qbed")
>>> HCT116_brd4 = cc.datasets.SP1_K562HCT116_data(data="HCT116_brd4_qbed")
>>> peak_data_HCT116 = cc.pp.callpeaks(HCT116_SP1, HCT116_brd4, method = "cc_tools", reference = "hg38",  window_size = 2000, step_size = 500,
        pvalue_cutoffTTAA = 0.001, pvalue_cutoffbg = 0.1, lam_win_size = None,  pseudocounts = 0.1, record = True, save = "peak_HCT116_test.bed")
>>> cc.tl.call_motif("peak_HCT116_test.bed",reference ="hg38",save_homer = "Homer/peak_HCT116_test", homer_path = "/ref/rmlab/software/homer/bin")