pycallingcards.tools.call_motif#
- pycallingcards.tools.call_motif(peaks_path=None, peaks_frame=None, save_name=None, reference='hg38', save_homer=None, size=1000, homer_path=None, motif_length=None, num_cores=3, denovo=False)[source]#
Call motif by HOMER and [Heinz et al., 2010]. Please make sure HOMER is installed along with the genome data.
- Parameters:
peaks_path (
Optional
[str
] (default:None
)) – pd.DataFrame with the path to the peak data. If this is provided, it will not consider peaks_frame and save_name.peaks_frame (
Optional
[DataFrame
] (default:None
)) – pd.DataFrame with the first three columns as chromosome, start and end.save_name (
Optional
[str
] (default:None
)) – The name of a saved peak file. Only used when peaks_frame is provided and peaks_path is not provided.reference (
Optional
[Literal
['hg38'
,'mm10'
,'sacCer3'
]] (default:'hg38'
)) – reference of the annoatation data. Currently, only ‘hg38’, ‘mm10’, ‘sacCer3’ are provided. Make sure the genome in HOMER is installed. Eg for mm10: perl [path]/homer/.//configureHomer.pl -install mm10save_homer (
Optional
[str
] (default:None
)) – Where path and name of the annotation results will be saved. If ‘None’ it will be saved to “Homerresult/peaks_name”size (
int
(default:1000
)) – The size of the region for motif finding. This is one of the most important parameters and also a source of confusion for many. If you wish to find motifs using your peaks using their exact sizes, use the option “-size given”). However, for Transcription Factor peaks, most of the motifs are found +/- 50-75 bp from the peak center, making it better to use a fixed size rather than depending on your peak size.homer_path (
Optional
[str
] (default:None
)) – The default uses the default path for Homer.motif_length (
Optional
[int
] (default:None
)) – The default uses the default motif length for HOMER. Specifies the length of motifs to be found.num_cores (
int
(default:3
)) – Number of CPUs to use.deno – Whether to call denovo modif or not.
- Examples:
>>> import pycallingcards as cc >>> HCT116_SP1 = cc.datasets.SP1_K562HCT116_data(data="HCT116_SP1_qbed") >>> HCT116_brd4 = cc.datasets.SP1_K562HCT116_data(data="HCT116_brd4_qbed") >>> peak_data_HCT116 = cc.pp.callpeaks(HCT116_SP1, HCT116_brd4, method = "cc_tools", reference = "hg38", window_size = 2000, step_size = 500, pvalue_cutoffTTAA = 0.001, pvalue_cutoffbg = 0.1, lam_win_size = None, pseudocounts = 0.1, record = True, save = "peak_HCT116_test.bed") >>> cc.tl.call_motif("peak_HCT116_test.bed",reference ="hg38",save_homer = "Homer/peak_HCT116_test", homer_path = "/ref/rmlab/software/homer/bin")