Tutorial: Brd4 bindings for Male/Female mouse bulk calling cards data.¶

We will analyze bulk calling cards datasets from Kfoury et al., PNAS. (2021), which can be downloaded from GEO. These experiments were designed to study differences in Brd4 binding between male and female cells in a genetically engineered cellular model of glioblastoma (GBM). These nearly-isogenic cells were derived from murine neocortical postnatal day 1 (p1) astrocytes (male and female GBM astrocytes) and were engineered to contain a combined loss of neurofibromin (NF1) and p53 function. These cells were previously shown to display sex differences in important GBM phenotypes in vivo, including proliferation, clonogenic stem-like frequency, in vivo tumorigenesis, cell cycle regulation, gene expression, and chemotherapy response, that mimic those observed in GBM patients. For more details see [Kfoury et al., PNAS. (2021)].

In this tutorial, we will call peaks, annotate these peaks, perform differential peak analysis, and pair differentially bound genomic regions with nearby genes that are also differentially expressed between the sexes. There are 668525 insertions in female qbed data and 613728 insertions in male qbed data. For bulk RNA-seq data, there are 6 samples in total: 3 female samples and 3 male samples. It creates a group by peak anndata for calling cards data. If you want to create a replicate by peak anndata only, please check Github

[1]:

import pycallingcards as cc
import numpy as np
import pandas as pd
import scanpy as sc
from matplotlib import pyplot as plt
plt.rcParams['figure.dpi'] = 150

We start by reading qbed data. In these data, each row provides information about one Brd4-directed insertion organized into columns to indicate the chromosome, start point, end point, number of reads supporting the insertion, and the orientation and barcode for each insertion. For example, in the file displayed below, the first row tells us that the first insertion occured into a TTAA on Chromosome 1, whose coordinates start location 3478112 and end at 3478115. There was 1 read supporting this insertion and the transposon is orientated so that the plus strand of the transposon is on the plus strand of the genome. The transposon barcode is TTTGTCCAA. We add another column indecating its respected group.

Use cc.rd.read_qbed(filename) to load your own qbed data.

[2]:

Female_Brd4 = cc.datasets.mouse_brd4_data(data = "Female_Brd4")
Female_Brd4['group'] = "Female_Brd4"
Female_Brd4
#load female CC data

[2]:

	Chr	Start	End	Reads	Direction	Barcodes	group
0	chr1	3478111	3478115	1	+	TTTGTCCAA	Female_Brd4
1	chr1	3481743	3481747	1	+	TATGTACAA	Female_Brd4
2	chr1	3481957	3481961	1	-	TATGTACAA	Female_Brd4
3	chr1	3493161	3493165	16	+	TATGTACAA	Female_Brd4
4	chr1	3493702	3493706	1	+	CGTTACACA	Female_Brd4
...	...	...	...	...	...	...	...
1290214	chrY	90802841	90802845	1	-	CGTTACACA	Female_Brd4
1290215	chrY	90803579	90803583	1	-	TATGTACAA	Female_Brd4
1290216	chrY	90803579	90803583	1	-	TGTCGTGCA	Female_Brd4
1290217	chrY	90804429	90804433	1	-	TATACTCTA	Female_Brd4
1290218	chrY	90805130	90805134	1	-	GACGGCTCC	Female_Brd4

1290219 rows × 7 columns

[3]:

Male_Brd4 = cc.datasets.mouse_brd4_data(data = "Male_Brd4")
Male_Brd4['group'] = "Male_Brd4"
Male_Brd4
#load male CC data

[3]:

	Chr	Start	End	Reads	Direction	Barcodes	group
0	chr1	3493161	3493165	1	-	GAGGTACAG	Male_Brd4
1	chr1	3493275	3493279	1	+	TACTTTCCG	Male_Brd4
2	chr1	3493341	3493345	2	+	AGCACAGTG	Male_Brd4
3	chr1	3493341	3493345	10	+	TGTCCATTG	Male_Brd4
4	chr1	3493341	3493345	8	+	ATGAAAGCA	Male_Brd4
...	...	...	...	...	...	...	...
1185697	chrY	90803733	90803737	13	-	TACTTTCCG	Male_Brd4
1185698	chrY	90803733	90803737	9	-	CACGGCGGA	Male_Brd4
1185699	chrY	90803733	90803737	17	-	GATCTTATG	Male_Brd4
1185700	chrY	90803774	90803778	4	-	ATGAAAGCA	Male_Brd4
1185701	chrY	90806562	90806566	1	-	AGCACAGTG	Male_Brd4

1185702 rows × 7 columns

Since we are interested in differential peak calling, we now combine the two datasets to call peaks on the joint dataset:

[4]:

Brd4 = cc.rd.combine_qbed([Female_Brd4, Male_Brd4])
Brd4

[4]:

	Chr	Start	End	Reads	Direction	Barcodes	group
0	chr1	3478111	3478115	1	+	TTTGTCCAA	Female_Brd4
1	chr1	3481743	3481747	1	+	TATGTACAA	Female_Brd4
2	chr1	3481957	3481961	1	-	TATGTACAA	Female_Brd4
3	chr1	3493161	3493165	16	+	TATGTACAA	Female_Brd4
4	chr1	3493161	3493165	1	-	GAGGTACAG	Male_Brd4
...	...	...	...	...	...	...	...
2475916	chrY	90803733	90803737	17	-	GATCTTATG	Male_Brd4
2475917	chrY	90803774	90803778	4	-	ATGAAAGCA	Male_Brd4
2475918	chrY	90804429	90804433	1	-	TATACTCTA	Female_Brd4
2475919	chrY	90805130	90805134	1	-	GACGGCTCC	Female_Brd4
2475920	chrY	90806562	90806566	1	-	AGCACAGTG	Male_Brd4

2475921 rows × 7 columns

Optional: filter the insertions with reads greater than 2. Because we use strict quality control on read insertions, even insertions supported by 1 read are likely to be bona fide.

[5]:

Brd4 = Brd4[Brd4['Reads'] > 2]
Brd4

[5]:

	Chr	Start	End	Reads	Direction	Barcodes	group
3	chr1	3493161	3493165	16	+	TATGTACAA	Female_Brd4
7	chr1	3493341	3493345	10	+	TGTCCATTG	Male_Brd4
8	chr1	3493341	3493345	8	+	ATGAAAGCA	Male_Brd4
10	chr1	3493643	3493647	3	+	TCTACTGCC	Male_Brd4
17	chr1	3514553	3514557	7	+	TACTTTCCG	Male_Brd4
...	...	...	...	...	...	...	...
2475905	chrY	42512546	42512550	3	-	ACGCGCTGT	Male_Brd4
2475914	chrY	90803733	90803737	13	-	TACTTTCCG	Male_Brd4
2475915	chrY	90803733	90803737	9	-	CACGGCGGA	Male_Brd4
2475916	chrY	90803733	90803737	17	-	GATCTTATG	Male_Brd4
2475917	chrY	90803774	90803778	4	-	ATGAAAGCA	Male_Brd4

1229664 rows × 7 columns

Because insertions are discrete, we now need to call peaks to deduce potential binding sites. Three different methods (CCcaller, MACCs, Blockify) are available along with three different species (hg38, mm10, sacCer3).

In this setting, we use CCcaller in mouse(‘mm10’) data. maxbetween is the most important parameter for CCcaller. It controls the maximum distance between two nearby insertions, or in another words, the minimum distance between two peaks. We recommend using a value between 800-1200 to set maxbetween. pvalue_cutoff is also an important parameter and a number 0.0001-0.01 is strongly advised. The setting of pseudocounts is largely influenced by library size. For the first time of trial, it can be adjusted to \(10^{6}-10^{-5} \times\) the number of insertions.

[6]:

peak_data = cc.pp.call_peaks(Brd4, method = "CCcaller", reference = "mm10", pvalue_cutoff = 0.01,
                             maxbetween = 1100, lam_win_size = 1000000, pseudocounts = 0.1,
                             record = True, save = "peak.bed")
peak_data

For the CCcaller method without background, [expdata, reference, pvalue_cutoff, lam_win_size, pseudocounts, minlen, extend, maxbetween, test_method, min_insertions, record] would be utilized.

100%|██████████| 21/21 [00:21<00:00,  1.00s/it]

[6]:

	Chr	Start	End	Experiment Insertions	Reference Insertions	Expected Insertions	pvalue	pvalue_adj
0	chr1	4196775	4198714	16	12	0.142389	0.000000e+00	0.000000e+00
1	chr1	4785068	4786551	17	15	0.572576	0.000000e+00	0.000000e+00
2	chr1	4806673	4807730	21	12	0.477937	0.000000e+00	0.000000e+00
3	chr1	4856928	4862019	104	44	1.490080	0.000000e+00	0.000000e+00
4	chr1	4912814	4916593	7	22	0.791756	1.900897e-06	1.024407e-04
...	...	...	...	...	...	...	...	...
24546	chrX	169936677	169937745	7	8	1.189526	3.479200e-05	1.673495e-03
24547	chrY	897430	899700	11	24	0.528165	6.046275e-13	5.162843e-11
24548	chrY	1009018	1011799	27	25	0.489073	0.000000e+00	0.000000e+00
24549	chrY	1243715	1246316	24	29	0.504932	0.000000e+00	0.000000e+00
24550	chrY	1282482	1287504	19	60	0.945958	0.000000e+00	0.000000e+00

24551 rows × 8 columns

In order to tune parameters for peak calling, we advise looking at the data and evaluating the validity of the called peaks. The default settings are what we recommend, but for some TFs adjacent peaks may be merged that should not be, or, alternatively, peaks that should be joined may be called separately.

Below, we plot the combined calling card data in males and females for a region in chromosome 1. The top track displays insertion locations and their read counts. Each dot is an insertion and the height is log(reads+1). The middle track plots the insertion density. The thrid track represents the reference genes and peaks. Finally, the last track represents peak calls. Below you can see that regions with high densities of insertions are accurately called as Brd4 binding sites.

[7]:

cc.pl.draw_area("chr1", 4856929, 4863861, 100000, peak_data, Brd4, "mm10", font_size = 2,
                plotsize = [1,1,5],
                figsize = (30,10), peak_line = 2, save = False, example_length = 5000)

../../_images/tutorials_notebooks_Brd4_bulk_13_0.png

[8]:

cc.pl.draw_area("chr2", 6136575, 6144043, 75000, peak_data, Brd4, "mm10", font_size = 2, plotsize = [1,1,5],
                figsize = (30,8), peak_line = 1, save = False, example_length = 5000)

../../_images/tutorials_notebooks_Brd4_bulk_14_0.png

We can also visualize our data directly in the WashU Epigenome Browser. This can be useful for overlaying your data with other published datasets. Notice that this link only valid for 24hrs, so please rerun it if you want to use it.

[9]:

qbed = {"Brd4": Brd4}
bed = {"peak": peak_data}
cc.pl.WashU_browser_url(qbed = qbed, bed = bed, genome = 'mm10')

All qbed addressed
All bed addressed
Uploading files
Please click the following link to see the data on WashU Epigenome Browser directly.
https://epigenomegateway.wustl.edu/browser/?genome=mm10&hub=https://companion.epigenomegateway.org//task/544b112245ab68df385f75da9d4bfd8b/output//datahub.json

Pycallingcards can be used to visual peak locations acorss the genome to see if the distribution of peaks is unbiased and all chromosomes are represented.

[10]:

cc.pl.whole_peaks(peak_data, reference = "mm10", linewidth = 1)

../../_images/tutorials_notebooks_Brd4_bulk_18_0.png

For differential peak calling, we recommend first combining the data and calling peaks together (and then splitting the data and looking for enrichments under the jointly called peaks). However, it is often of interest to call peaks in each sample to analyze them separately or to then merge these peak calls using pybedtools. Below is the code to do this:

import pybedtools
peak_data1 = cc.pp.call_peaks(Female_Brd4, method = "CCcaller",
                              reference = "mm10", pvalue_cutoffbg = 0.1,
                              maxbetween = 2000, pvalue_cutoffTTAA = 0.001,
                              lam_win_size = 1000000,  pseudocounts = 0.1, record = True)
peak_data2 = cc.pp.call_peaks(Male_Brd4, method = "CCcaller",
                              reference = "mm10",  pvalue_cutoffbg = 0.1,
                              maxbetween = 2000, pvalue_cutoffTTAA = 0.001,
                              lam_win_size = 1000000, pseudocounts = 0.1, record = True)
peak = cc.rd.combine_qbed([peak_data1, peak_data2])
peak = pybedtools.BedTool.from_dataframe(peak).merge().to_dataframe()
peak_data = peak.rename(columns={"chrom":"Chr", "start":"Start", "end":"End"})

In the next step, we annote each peak to annotate using bedtools and pybedtools. Make sure they are all previously installed before using.

[11]:

peak_annotation = cc.pp.annotation(peak_data, reference = "mm10")
peak_annotation = cc.pp.combine_annotation(peak_data, peak_annotation)
peak_annotation

In the bedtools method, we would use bedtools in the default path. Set bedtools path by 'bedtools_path' if needed.

[11]:

	Chr	Start	End	Experiment Insertions	Reference Insertions	Expected Insertions	pvalue	pvalue_adj	Nearest Refseq1	Gene Name1	Direction1	Distance1	Nearest Refseq2	Gene Name2	Direction2	Distance2
0	chr1	4196775	4198714	16	12	0.142389	0.000000e+00	0.000000e+00	NM_001195662	Rp1	-	92132	NM_011283	Rp1	-	144793
1	chr1	4785068	4786551	17	15	0.572576	0.000000e+00	0.000000e+00	NR_033530	Mrpl15	-	0	NM_008866	Lypla1	+	21342
2	chr1	4806673	4807730	21	12	0.477937	0.000000e+00	0.000000e+00	NM_008866	Lypla1	+	163	NR_033530	Mrpl15	-	-20948
3	chr1	4856928	4862019	104	44	1.490080	0.000000e+00	0.000000e+00	NM_011541	Tcea1	+	0	NM_008866	Lypla1	+	-10194
4	chr1	4912814	4916593	7	22	0.791756	1.900897e-06	1.024407e-04	NM_001290372	Rgs20	-	0	NM_011541	Tcea1	+	-14906
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
24546	chrX	169936677	169937745	7	8	1.189526	3.479200e-05	1.673495e-03	NM_010797	Mid1	+	0	NR_029457	G530011O06Rik	-	37298
24547	chrY	897430	899700	11	24	0.528165	6.046275e-13	5.162843e-11	NM_011419	Kdm5d	+	0	NM_011667	Uba1y	+	-53207
24548	chrY	1009018	1011799	27	25	0.489073	0.000000e+00	0.000000e+00	NM_012011	Eif2s3y	+	0	NR_027507	Tspy-ps	-	43965
24549	chrY	1243715	1246316	24	29	0.504932	0.000000e+00	0.000000e+00	NM_009484	Uty	-	0	NM_012008	Ddx3y	-	14399
24550	chrY	1282482	1287504	19	60	0.945958	0.000000e+00	0.000000e+00	NM_012008	Ddx3y	-	0	NM_148943	Usp9y	-	11457

24551 rows × 16 columns

Use qbed data, peak data, and barcode data to make a group by peak Anndata object.

[12]:

adata_cc = cc.pp.make_Anndata(Brd4, peak_annotation, ["Female_Brd4", "Male_Brd4"], key = 'group')
adata_cc

100%|██████████| 21/21 [00:07<00:00,  2.64it/s]

[12]:

AnnData object with n_obs × n_vars = 2 × 24551
    var: 'Chr', 'Start', 'End', 'Experiment Insertions', 'Reference Insertions', 'Expected Insertions', 'pvalue', 'pvalue_adj', 'Nearest Refseq1', 'Gene Name1', 'Direction1', 'Distance1', 'Nearest Refseq2', 'Gene Name2', 'Direction2', 'Distance2'

[13]:

adata_cc.obs

[13]:


Index
Female_Brd4
Male_Brd4

[14]:

adata_cc

[14]:

AnnData object with n_obs × n_vars = 2 × 24551
    var: 'Chr', 'Start', 'End', 'Experiment Insertions', 'Reference Insertions', 'Expected Insertions', 'pvalue', 'pvalue_adj', 'Nearest Refseq1', 'Gene Name1', 'Direction1', 'Distance1', 'Nearest Refseq2', 'Gene Name2', 'Direction2', 'Distance2'

[15]:

adata_cc = cc.tl.liftover(adata_cc)
adata_cc

100%|██████████| 24551/24551 [04:44<00:00, 86.27it/s]

[15]:

AnnData object with n_obs × n_vars = 2 × 24551
    var: 'Chr', 'Start', 'End', 'Experiment Insertions', 'Reference Insertions', 'Expected Insertions', 'pvalue', 'pvalue_adj', 'Nearest Refseq1', 'Gene Name1', 'Direction1', 'Distance1', 'Nearest Refseq2', 'Gene Name2', 'Direction2', 'Distance2', 'Chr_liftover', 'Start_liftover', 'End_liftover'

Differential peak analysis will find out the significant binding for each group. In this example, we use the Fisher’s exact test to find out.

[16]:

cc.tl.rank_peak_groups(adata_cc, 'Index', method = 'fisher_exact', key_added = 'fisher_exact')

100%|██████████| 2/2 [01:43<00:00, 51.87s/it]

Plot the results for differential peak analysis.

cc.tl.rank_peak_groups(adata_cc, 'Index', method = 'fisher_exact', key_added = 'fisher_exact',
                       rankby = 'logfoldchanges')
cc.pl.rank_peak_groups(adata_cc, key = 'fisher_exact',rankby = 'logfoldchanges')

[17]:

cc.pl.rank_peak_groups(adata_cc, key = 'fisher_exact', rankby = 'pvalues')

../../_images/tutorials_notebooks_Brd4_bulk_30_0.png

We can use Pycallingcards to look at the insertion densities at peaks that are differentially bound. In the plot below, we first plot female peaks as blue points and male as grey points, and then we plot male peaks as red and female peaks as grey. From the plots below we can see there is much more Brd4 binding at this locus in females than in males.

[18]:

cc.pl.draw_area("chr9", 91366534, 91425373, 50000, peak_data, Brd4, "mm10", adata = adata_cc, font_size=2,
                plotsize = [1,1,4], name = "Female_Brd4",key = "Index",insertionkey = "group",
                name_insertion1 = 'Female Insertions', name_density1 = 'Female Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                bins = 400, figsize = (30,12), peak_line = 6, title = "Female")
cc.pl.draw_area("chr9", 91366534, 91425373, 50000, peak_data, Brd4, "mm10",  adata = adata_cc, font_size=2,
                plotsize = [1,1,4], name = "Male_Brd4",key = "Index", insertionkey = "group",
                name_insertion1 = 'Male Insertions', name_density1 = 'Male Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                bins = 400, figsize = (30,12), peak_line = 6, color = "blue", title = "Male")

../../_images/tutorials_notebooks_Brd4_bulk_32_0.png

../../_images/tutorials_notebooks_Brd4_bulk_32_1.png

[19]:

cc.pl.draw_area("chr12", 56516453, 56538107, 50000, peak_data, Brd4, "mm10", adata = adata_cc, font_size=2,
                plotsize = [1,1,3], name = "Female_Brd4", key = "Index",insertionkey = "group", bins = 200,
                name_insertion1 = 'Female Insertions', name_density1 = 'Female Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                figsize = (30,8), peak_line = 2, title = "Female")
cc.pl.draw_area("chr12", 56516453, 56538107, 50000, peak_data, Brd4, "mm10", adata = adata_cc, font_size=2,
                plotsize = [1,1,3], name = "Male_Brd4", key = "Index", insertionkey = "group", bins = 200,
                name_insertion1 = 'Male Insertions', name_density1 = 'Male Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                figsize = (30,8), peak_line = 2, color = "blue", title = "Male")

../../_images/tutorials_notebooks_Brd4_bulk_33_0.png

../../_images/tutorials_notebooks_Brd4_bulk_33_1.png

We can now plot a volcano plot to visualize the distribution of differentially bound peaks. labelright and labellest are the location points of the labels.

[20]:

cc.pl.volcano_plot(adata_cc, pvalue_name = 'pvalues_adj', pvalue_cutoff = 0.01, lfc_cutoff = 4,
                   figsize = (5,4), labelright = (3,250), labelleft = (-9,250))

/home/juanru/Desktop/pycallingcards/pycallingcards/pycallingcards/plotting/_volcano.py:70: RuntimeWarning: divide by zero encountered in log10
  pva = -np.log10(np.array(adata_cc.uns[name][pvalue_name].tolist())[:, 1])

../../_images/tutorials_notebooks_Brd4_bulk_35_1.png

We can now use HOMER to call motifs for both female and male bound peaks separately to find out potential motif binding difference.

[31]:

cc.tl.call_motif(peaks_frame=cc.tl.rank_peak_groups_df(adata_cc, key = 'fisher_exact', pval_cutoff = 0.01, logfc_min = 3, group = ['Female_Brd4'])["names"].str.split("_", expand=True),
                 reference ="mm10",save_homer = "Homer/GBM_female",
                 homer_path = "/home/juanru/miniconda3/bin/", num_cores=12)
cc.tl.call_motif(peaks_frame=cc.tl.rank_peak_groups_df(adata_cc, key = 'fisher_exact', pval_cutoff = 0.01, logfc_min = 3, group = ['Male_Brd4'])["names"].str.split("_", expand=True),
                 reference ="mm10",save_homer = "Homer/GBM_male",
                 homer_path = "/home/juanru/miniconda3/bin/", num_cores=12)

There is no save_name, it will save to temp_Homer_trial.bed and then delete.


        Position file = temp_Homer_trial.bed
        Genome = mm10
        Output Directory = Homer/GBM_female
        Fragment size set to 1000
        Using 12 CPUs
        Will not run homer for de novo motifs
        Found mset for "mouse", will check against vertebrates motifs
        Peak/BED file conversion summary:
                BED/Header formatted lines: 1009
                peakfile formatted lines: 0

        Peak File Statistics:
                Total Peaks: 1009
                Redundant Peak IDs: 0
                Peaks lacking information: 0 (need at least 5 columns per peak)
                Peaks with misformatted coordinates: 0 (should be integer)
                Peaks with misformatted strand: 0 (should be either +/- or 0/1)

        Peak file looks good!

        Background files for 1000 bp fragments found.
        Custom genome sequence directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//

        Extracting sequences from directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//
        Extracting 96 sequences from chr1
        Extracting 93 sequences from chr2
        Extracting 62 sequences from chr3
        Extracting 39 sequences from chr4
        Extracting 67 sequences from chr5
        Extracting 66 sequences from chr6
        Extracting 62 sequences from chr7
        Extracting 45 sequences from chr8
        Extracting 59 sequences from chr9
        Extracting 50 sequences from chr10
        Extracting 41 sequences from chr11
        Extracting 41 sequences from chr12
        Extracting 46 sequences from chr13
        Extracting 25 sequences from chr14
        Extracting 26 sequences from chr15
        Extracting 40 sequences from chr16
        Extracting 40 sequences from chr17
        Extracting 32 sequences from chr18
        Extracting 32 sequences from chr19
        Extracting 47 sequences from chrX

        Not removing redundant sequences


        Sequences processed:
                Auto detected maximum sequence length of 1001 bp
                1009 total

        Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8
        Freq    Bin     Count
        0.3     2       14
        0.35    3       101
        0.4     4       317
        0.45    5       309
        0.5     6       201
        0.6     7       66
        0.7     8       1

        Total sequences set to 50000

        Choosing background that matches in CpG/GC content...
        Bin     # Targets       # Background    Background Weight
        2       14      680     1.000
        3       101     4904    1.000
        4       317     15392   1.000
        5       309     15003   1.000
        6       201     9759    1.000
        7       66      3205    1.000
        8       1       49      0.991
        Assembling sequence file...
        Normalizing lower order oligos using homer2

        Reading input files...
        50001 total sequences read
        Autonormalization: 1-mers (4 total)
                A       29.26%  29.21%  1.001
                C       20.74%  20.79%  0.998
                G       20.74%  20.79%  0.998
                T       29.26%  29.21%  1.001
        Autonormalization: 2-mers (16 total)
                AA      9.53%   9.22%   1.034
                CA      7.34%   7.44%   0.987
                GA      6.15%   6.15%   1.001
                TA      6.23%   6.41%   0.972
                AC      5.27%   5.38%   0.980
                CC      5.08%   5.17%   0.983
                GC      4.24%   4.08%   1.037
                TC      6.15%   6.15%   1.001
                AG      7.36%   7.34%   1.003
                CG      0.96%   0.84%   1.148
                GG      5.08%   5.17%   0.983
                TG      7.34%   7.44%   0.987
                AT      7.09%   7.28%   0.974
                CT      7.36%   7.34%   1.003
                GT      5.27%   5.38%   0.980
                TT      9.53%   9.22%   1.034
        Autonormalization: 3-mers (64 total)
        Normalization weights can be found in file: Homer/GBM_female/seq.autonorm.tsv
        Converging on autonormalization solution:
        ...............................................................................
        Final normalization:    Autonormalization: 1-mers (4 total)
                A       29.26%  29.24%  1.001
                C       20.74%  20.76%  0.999
                G       20.74%  20.76%  0.999
                T       29.26%  29.24%  1.001
        Autonormalization: 2-mers (16 total)
                AA      9.53%   9.41%   1.013
                CA      7.34%   7.35%   0.998
                GA      6.15%   6.14%   1.001
                TA      6.23%   6.33%   0.985
                AC      5.27%   5.32%   0.991
                CC      5.08%   5.14%   0.989
                GC      4.24%   4.16%   1.018
                TC      6.15%   6.14%   1.001
                AG      7.36%   7.34%   1.002
                CG      0.96%   0.93%   1.033
                GG      5.08%   5.14%   0.989
                TG      7.34%   7.35%   0.998
                AT      7.09%   7.16%   0.990
                CT      7.36%   7.34%   1.002
                GT      5.27%   5.32%   0.991
                TT      9.53%   9.41%   1.013
        Autonormalization: 3-mers (64 total)
        Finished preparing sequence/group files

        ----------------------------------------------------------
        Known motif enrichment

        Reading input files...
        50001 total sequences read
        440 motifs loaded
        Cache length = 11180
        Using binomial scoring
        Checking enrichment of 440 motif(s)
        |0%                                    50%                                  100%|
        =================================================================================
        Preparing HTML output with sequence logos...
                1 of 440 (1e-18) Chop(bZIP)/MEF-Chop-ChIP-Seq(GSE35681)/Homer
                2 of 440 (1e-17) Atf4(bZIP)/MEF-Atf4-ChIP-Seq(GSE35681)/Homer
                3 of 440 (1e-17) ISRE(IRF)/ThioMac-LPS-Expression(GSE23622)/Homer
                4 of 440 (1e-15) Atf3(bZIP)/GBM-ATF3-ChIP-Seq(GSE33912)/Homer
                5 of 440 (1e-15) Fos(bZIP)/TSC-Fos-ChIP-Seq(GSE110950)/Homer
                6 of 440 (1e-15) AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
                7 of 440 (1e-15) JunB(bZIP)/DendriticCells-Junb-ChIP-Seq(GSE36099)/Homer
                8 of 440 (1e-14) Fra1(bZIP)/BT549-Fra1-ChIP-Seq(GSE46166)/Homer
                9 of 440 (1e-13) BATF(bZIP)/Th17-BATF-ChIP-Seq(GSE39756)/Homer
                10 of 440 (1e-13) IRF2(IRF)/Erythroblas-IRF2-ChIP-Seq(GSE36985)/Homer
                11 of 440 (1e-12) IRF3(IRF)/BMDM-Irf3-ChIP-Seq(GSE67343)/Homer
                12 of 440 (1e-12) IRF8(IRF)/BMDM-IRF8-ChIP-Seq(GSE77884)/Homer
                13 of 440 (1e-11) Fra2(bZIP)/Striatum-Fra2-ChIP-Seq(GSE43429)/Homer
                14 of 440 (1e-11) Fosl2(bZIP)/3T3L1-Fosl2-ChIP-Seq(GSE56872)/Homer
                15 of 440 (1e-10) Jun-AP1(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer
                16 of 440 (1e-9) IRF1(IRF)/PBMC-IRF1-ChIP-Seq(GSE43036)/Homer
                17 of 440 (1e-7) CEBP:AP1(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer
                18 of 440 (1e-7) Rbpj1(?)/Panc1-Rbpj1-ChIP-Seq(GSE47459)/Homer
                19 of 440 (1e-7) TEAD1(TEAD)/HepG2-TEAD1-ChIP-Seq(Encode)/Homer
                20 of 440 (1e-6) PU.1:IRF8(ETS:IRF)/pDC-Irf8-ChIP-Seq(GSE66899)/Homer
                21 of 440 (1e-6) GABPA(ETS)/Jurkat-GABPa-ChIP-Seq(GSE17954)/Homer
                22 of 440 (1e-6) Zic3(Zf)/mES-Zic3-ChIP-Seq(GSE37889)/Homer
                23 of 440 (1e-5) Unknown-ESC-element(?)/mES-Nanog-ChIP-Seq(GSE11724)/Homer
                24 of 440 (1e-4) ETS1(ETS)/Jurkat-ETS1-ChIP-Seq(GSE17954)/Homer
                25 of 440 (1e-4) NF1-halfsite(CTF)/LNCaP-NF1-ChIP-Seq(Unpublished)/Homer
                26 of 440 (1e-4) Mef2c(MADS)/GM12878-Mef2c-ChIP-Seq(GSE32465)/Homer
                27 of 440 (1e-4) TEAD(TEA)/Fibroblast-PU.1-ChIP-Seq(Unpublished)/Homer
                28 of 440 (1e-4) Isl1(Homeobox)/Neuron-Isl1-ChIP-Seq(GSE31456)/Homer
                29 of 440 (1e-4) STAT5(Stat)/mCD4+-Stat5-ChIP-Seq(GSE12346)/Homer
                30 of 440 (1e-4) TEAD3(TEA)/HepG2-TEAD3-ChIP-Seq(Encode)/Homer
                31 of 440 (1e-4) CEBP:CEBP(bZIP)/MEF-Chop-ChIP-Seq(GSE35681)/Homer
                32 of 440 (1e-4) TEAD4(TEA)/Tropoblast-Tead4-ChIP-Seq(GSE37350)/Homer
                33 of 440 (1e-4) Pitx1:Ebox(Homeobox,bHLH)/Hindlimb-Pitx1-ChIP-Seq(GSE41591)/Homer
                34 of 440 (1e-4) PAX6(Paired,Homeobox)/Forebrain-Pax6-ChIP-Seq(GSE66961)/Homer
                35 of 440 (1e-4) Smad4(MAD)/ESC-SMAD4-ChIP-Seq(GSE29422)/Homer
                36 of 440 (1e-4) Dlx3(Homeobox)/Kerainocytes-Dlx3-ChIP-Seq(GSE89884)/Homer
                37 of 440 (1e-4) Fli1(ETS)/CD8-FLI-ChIP-Seq(GSE20898)/Homer
                38 of 440 (1e-4) CREB5(bZIP)/LNCaP-CREB5.V5-ChIP-Seq(GSE137775)/Homer
                39 of 440 (1e-3) EWS:FLI1-fusion(ETS)/SK_N_MC-EWS:FLI1-ChIP-Seq(SRA014231)/Homer
                40 of 440 (1e-3) Bach2(bZIP)/OCILy7-Bach2-ChIP-Seq(GSE44420)/Homer
                41 of 440 (1e-3) STAT4(Stat)/CD4-Stat4-ChIP-Seq(GSE22104)/Homer
                42 of 440 (1e-3) IRF4(IRF)/GM12878-IRF4-ChIP-Seq(GSE32465)/Homer
                43 of 440 (1e-3) Etv2(ETS)/ES-ER71-ChIP-Seq(GSE59402)/Homer
                44 of 440 (1e-3) HLF(bZIP)/HSC-HLF.Flag-ChIP-Seq(GSE69817)/Homer
                45 of 440 (1e-3) c-Jun-CRE(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer
                46 of 440 (1e-3) AMYB(HTH)/Testes-AMYB-ChIP-Seq(GSE44588)/Homer
                47 of 440 (1e-3) EWS:ERG-fusion(ETS)/CADO_ES1-EWS:ERG-ChIP-Seq(SRA014231)/Homer
                48 of 440 (1e-3) TEAD2(TEA)/Py2T-Tead2-ChIP-Seq(GSE55709)/Homer
                49 of 440 (1e-3) NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer
                50 of 440 (1e-3) ETV1(ETS)/GIST48-ETV1-ChIP-Seq(GSE22441)/Homer
                51 of 440 (1e-3) Zic(Zf)/Cerebellum-ZIC1.2-ChIP-Seq(GSE60731)/Homer
                52 of 440 (1e-3) Mef2a(MADS)/HL1-Mef2a.biotin-ChIP-Seq(GSE21529)/Homer
                53 of 440 (1e-3) Elf4(ETS)/BMDM-Elf4-ChIP-Seq(GSE88699)/Homer
                54 of 440 (1e-3) CArG(MADS)/PUER-Srf-ChIP-Seq(Sullivan_et_al.)/Homer
                55 of 440 (1e-3) Elk1(ETS)/Hela-Elk1-ChIP-Seq(GSE31477)/Homer
                56 of 440 (1e-2) MafA(bZIP)/Islet-MafA-ChIP-Seq(GSE30298)/Homer
                57 of 440 (1e-2) MYB(HTH)/ERMYB-Myb-ChIPSeq(GSE22095)/Homer
                58 of 440 (1e-2) PU.1-IRF(ETS:IRF)/Bcell-PU.1-ChIP-Seq(GSE21512)/Homer
                59 of 440 (1e-2) EBF1(EBF)/Near-E2A-ChIP-Seq(GSE21512)/Homer
                60 of 440 (1e-2) DLX5(Homeobox)/BasalGanglia-Dlx5-ChIP-seq(GSE124936)/Homer
                61 of 440 (1e-2) Zic2(Zf)/ESC-Zic2-ChIP-Seq(SRP197560)/Homer
                62 of 440 (1e-2) DLX1(Homeobox)/BasalGanglia-Dlx1-ChIP-seq(GSE124936)/Homer
                63 of 440 (1e-2) Six2(Homeobox)/NephronProgenitor-Six2-ChIP-Seq(GSE39837)/Homer
                64 of 440 (1e-2) STAT6(Stat)/Macrophage-Stat6-ChIP-Seq(GSE38377)/Homer
                65 of 440 (1e-2) NFAT:AP1(RHD,bZIP)/Jurkat-NFATC1-ChIP-Seq(Jolma_et_al.)/Homer
                66 of 440 (1e-2) Fox:Ebox(Forkhead,bHLH)/Panc1-Foxa2-ChIP-Seq(GSE47459)/Homer
                67 of 440 (1e-2) Stat3(Stat)/mES-Stat3-ChIP-Seq(GSE11431)/Homer
                68 of 440 (1e-2) NFkB-p65-Rel(RHD)/ThioMac-LPS-Expression(GSE23622)/Homer
                69 of 440 (1e-2) EHF(ETS)/LoVo-EHF-ChIP-Seq(GSE49402)/Homer
                70 of 440 (1e-2) Hoxd11(Homeobox)/ChickenMSG-Hoxd11.Flag-ChIP-Seq(GSE86088)/Homer
                71 of 440 (1e-2) IRF:BATF(IRF:bZIP)/pDC-Irf8-ChIP-Seq(GSE66899)/Homer
                72 of 440 (1e-2) ETV4(ETS)/HepG2-ETV4-ChIP-Seq(ENCODE)/Homer
                73 of 440 (1e-2) Smad2(MAD)/ES-SMAD2-ChIP-Seq(GSE29422)/Homer
                74 of 440 (1e-2) STAT6(Stat)/CD4-Stat6-ChIP-Seq(GSE22104)/Homer
                75 of 440 (1e-2) Prop1(Homeobox)/GHFT1-PROP1.biotin-ChIP-Seq(GSE77302)/Homer
                76 of 440 (1e-2) Brn2(POU,Homeobox)/NPC-Brn2-ChIP-Seq(GSE35496)/Homer
                77 of 440 (1e-2) NFkB-p65(RHD)/GM12787-p65-ChIP-Seq(GSE19485)/Homer
                78 of 440 (1e-2) ETS:E-box(ETS,bHLH)/HPC7-Scl-ChIP-Seq(GSE22178)/Homer
                79 of 440 (1e-2) PU.1(ETS)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
                80 of 440 (1e-2) Nkx6.1(Homeobox)/Islet-Nkx6.1-ChIP-Seq(GSE40975)/Homer
                81 of 440 (1e-2) Mef2d(MADS)/Retina-Mef2d-ChIP-Seq(GSE61391)/Homer
        Skipping...
        Job finished - if results look good, please send beer to ..

        Cleaning up tmp files...


        Position file = temp_Homer_trial.bed
        Genome = mm10
        Output Directory = Homer/GBM_male
        Fragment size set to 1000
        Using 12 CPUs
        Will not run homer for de novo motifs
        Found mset for "mouse", will check against vertebrates motifs
        Peak/BED file conversion summary:
                BED/Header formatted lines: 585
                peakfile formatted lines: 0

        Peak File Statistics:
                Total Peaks: 585
                Redundant Peak IDs: 0
                Peaks lacking information: 0 (need at least 5 columns per peak)
                Peaks with misformatted coordinates: 0 (should be integer)
                Peaks with misformatted strand: 0 (should be either +/- or 0/1)

        Peak file looks good!

Finished!
There is no save_name, it will save to temp_Homer_trial.bed and then delete.

        Background files for 1000 bp fragments found.
        Custom genome sequence directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//

        Extracting sequences from directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//
        Extracting 36 sequences from chr1
        Extracting 40 sequences from chr2
        Extracting 33 sequences from chr3
        Extracting 20 sequences from chr4
        Extracting 28 sequences from chr5
        Extracting 35 sequences from chr6
        Extracting 18 sequences from chr7
        Extracting 25 sequences from chr8
        Extracting 21 sequences from chr9
        Extracting 52 sequences from chr10
        Extracting 39 sequences from chr11
        Extracting 26 sequences from chr12
        Extracting 23 sequences from chr13
        Extracting 36 sequences from chr14
        Extracting 43 sequences from chr15
        Extracting 18 sequences from chr16
        Extracting 17 sequences from chr17
        Extracting 28 sequences from chr18
        Extracting 22 sequences from chr19
        Extracting 21 sequences from chrX
        Extracting 4 sequences from chrY

        Not removing redundant sequences


        Sequences processed:
                Auto detected maximum sequence length of 1001 bp
                585 total

        Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8
        Freq    Bin     Count
        0.3     2       2
        0.35    3       41
        0.4     4       134
        0.45    5       215
        0.5     6       133
        0.6     7       56
        0.7     8       4

        Total sequences set to 50000

        Choosing background that matches in CpG/GC content...
        Bin     # Targets       # Background    Background Weight
        2       2       169     1.000
        3       41      3463    1.000
        4       134     11319   1.000
        5       215     18161   1.000
        6       133     11235   1.000
        7       56      4730    1.000
        8       4       338     1.000
        Assembling sequence file...
        Normalizing lower order oligos using homer2

        Reading input files...
        50000 total sequences read
        Autonormalization: 1-mers (4 total)
                A       28.57%  28.51%  1.002
                C       21.43%  21.49%  0.997
                G       21.43%  21.49%  0.997
                T       28.57%  28.51%  1.002
        Autonormalization: 2-mers (16 total)
                AA      9.11%   8.75%   1.041
                CA      7.48%   7.54%   0.993
                GA      6.12%   6.21%   0.985
                TA      5.87%   6.02%   0.974
                AC      5.39%   5.41%   0.995
                CC      5.41%   5.51%   0.982
                GC      4.51%   4.35%   1.035
                TC      6.12%   6.21%   0.985
                AG      7.40%   7.49%   0.988
                CG      1.13%   0.95%   1.196
                GG      5.41%   5.51%   0.982
                TG      7.48%   7.54%   0.993
                AT      6.67%   6.86%   0.973
                CT      7.40%   7.49%   0.988
                GT      5.39%   5.41%   0.995
                TT      9.11%   8.75%   1.041
        Autonormalization: 3-mers (64 total)
        Normalization weights can be found in file: Homer/GBM_male/seq.autonorm.tsv
        Converging on autonormalization solution:
        ...............................................................................
        Final normalization:    Autonormalization: 1-mers (4 total)
                A       28.57%  28.55%  1.001
                C       21.43%  21.45%  0.999
                G       21.43%  21.45%  0.999
                T       28.57%  28.55%  1.001
        Autonormalization: 2-mers (16 total)
                AA      9.11%   8.97%   1.016
                CA      7.48%   7.47%   1.001
                GA      6.12%   6.15%   0.995
                TA      5.87%   5.97%   0.983
                AC      5.39%   5.40%   0.998
                CC      5.41%   5.46%   0.992
                GC      4.51%   4.44%   1.014
                TC      6.12%   6.15%   0.995
                AG      7.40%   7.42%   0.997
                CG      1.13%   1.10%   1.033
                GG      5.41%   5.46%   0.992
                TG      7.48%   7.47%   1.001
                AT      6.67%   6.77%   0.986
                CT      7.40%   7.42%   0.997
                GT      5.39%   5.40%   0.998
                TT      9.11%   8.97%   1.016
        Autonormalization: 3-mers (64 total)
        Finished preparing sequence/group files

        ----------------------------------------------------------
        Known motif enrichment

        Reading input files...
        50000 total sequences read
        440 motifs loaded
        Cache length = 11180
        Using binomial scoring
        Checking enrichment of 440 motif(s)
        |0%                                    50%                                  100%|
        =================================================================================

Finished!


        Preparing HTML output with sequence logos...
                1 of 440 (1e-42) Fosl2(bZIP)/3T3L1-Fosl2-ChIP-Seq(GSE56872)/Homer
                2 of 440 (1e-41) Jun-AP1(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer
                3 of 440 (1e-39) Fos(bZIP)/TSC-Fos-ChIP-Seq(GSE110950)/Homer
                4 of 440 (1e-38) JunB(bZIP)/DendriticCells-Junb-ChIP-Seq(GSE36099)/Homer
                5 of 440 (1e-37) Fra1(bZIP)/BT549-Fra1-ChIP-Seq(GSE46166)/Homer
                6 of 440 (1e-35) Fra2(bZIP)/Striatum-Fra2-ChIP-Seq(GSE43429)/Homer
                7 of 440 (1e-35) BATF(bZIP)/Th17-BATF-ChIP-Seq(GSE39756)/Homer
                8 of 440 (1e-34) Atf3(bZIP)/GBM-ATF3-ChIP-Seq(GSE33912)/Homer
                9 of 440 (1e-34) AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
                10 of 440 (1e-22) Bach2(bZIP)/OCILy7-Bach2-ChIP-Seq(GSE44420)/Homer
                11 of 440 (1e-10) Nrf2(bZIP)/Lymphoblast-Nrf2-ChIP-Seq(GSE37589)/Homer
                12 of 440 (1e-9) NF-E2(bZIP)/K562-NFE2-ChIP-Seq(GSE31477)/Homer
                13 of 440 (1e-7) Bach1(bZIP)/K562-Bach1-ChIP-Seq(GSE31477)/Homer
                14 of 440 (1e-7) NFE2L2(bZIP)/HepG2-NFE2L2-ChIP-Seq(Encode)/Homer
                15 of 440 (1e-6) MafK(bZIP)/C2C12-MafK-ChIP-Seq(GSE36030)/Homer
                16 of 440 (1e-5) Ets1-distal(ETS)/CD4+-PolII-ChIP-Seq(Barski_et_al.)/Homer
                17 of 440 (1e-4) Isl1(Homeobox)/Neuron-Isl1-ChIP-Seq(GSE31456)/Homer
                18 of 440 (1e-4) Sox3(HMG)/NPC-Sox3-ChIP-Seq(GSE33059)/Homer
                19 of 440 (1e-3) Bapx1(Homeobox)/VertebralCol-Bapx1-ChIP-Seq(GSE36672)/Homer
                20 of 440 (1e-3) Fli1(ETS)/CD8-FLI-ChIP-Seq(GSE20898)/Homer
                21 of 440 (1e-3) MafA(bZIP)/Islet-MafA-ChIP-Seq(GSE30298)/Homer
                22 of 440 (1e-3) Fox:Ebox(Forkhead,bHLH)/Panc1-Foxa2-ChIP-Seq(GSE47459)/Homer
                23 of 440 (1e-3) Nkx2.2(Homeobox)/NPC-Nkx2.2-ChIP-Seq(GSE61673)/Homer
                24 of 440 (1e-3) Sox10(HMG)/SciaticNerve-Sox3-ChIP-Seq(GSE35132)/Homer
                25 of 440 (1e-3) LHX9(Homeobox)/Hct116-LHX9.V5-ChIP-Seq(GSE116822)/Homer
                26 of 440 (1e-3) TATA-Box(TBP)/Promoter/Homer
                27 of 440 (1e-3) KLF14(Zf)/HEK293-KLF14.GFP-ChIP-Seq(GSE58341)/Homer
                28 of 440 (1e-2) ZNF669(Zf)/HEK293-ZNF669.GFP-ChIP-Seq(GSE58341)/Homer
                29 of 440 (1e-2) Lhx2(Homeobox)/HFSC-Lhx2-ChIP-Seq(GSE48068)/Homer
                30 of 440 (1e-2) Sox4(HMG)/proB-Sox4-ChIP-Seq(GSE50066)/Homer
                31 of 440 (1e-2) DLX5(Homeobox)/BasalGanglia-Dlx5-ChIP-seq(GSE124936)/Homer
                32 of 440 (1e-2) RUNX-AML(Runt)/CD4+-PolII-ChIP-Seq(Barski_et_al.)/Homer
                33 of 440 (1e-2) Sp5(Zf)/mES-Sp5.Flag-ChIP-Seq(GSE72989)/Homer
                34 of 440 (1e-2) EWS:ERG-fusion(ETS)/CADO_ES1-EWS:ERG-ChIP-Seq(SRA014231)/Homer
                35 of 440 (1e-2) RUNX1(Runt)/Jurkat-RUNX1-ChIP-Seq(GSE29180)/Homer
                36 of 440 (1e-2) Etv2(ETS)/ES-ER71-ChIP-Seq(GSE59402)/Homer
                37 of 440 (1e-2) ETV4(ETS)/HepG2-ETV4-ChIP-Seq(ENCODE)/Homer
                38 of 440 (1e-2) Sox2(HMG)/mES-Sox2-ChIP-Seq(GSE11431)/Homer
                39 of 440 (1e-2) ETS1(ETS)/Jurkat-ETS1-ChIP-Seq(GSE17954)/Homer
                40 of 440 (1e-2) Brn1(POU,Homeobox)/NPC-Brn1-ChIP-Seq(GSE35496)/Homer
                41 of 440 (1e-2) CHR(?)/Hela-CellCycle-Expression/Homer
                42 of 440 (1e-2) Dlx3(Homeobox)/Kerainocytes-Dlx3-ChIP-Seq(GSE89884)/Homer
                43 of 440 (1e-2) EWS:FLI1-fusion(ETS)/SK_N_MC-EWS:FLI1-ChIP-Seq(SRA014231)/Homer
                44 of 440 (1e-2) KLF3(Zf)/MEF-Klf3-ChIP-Seq(GSE44748)/Homer
                45 of 440 (1e-2) STAT5(Stat)/mCD4+-Stat5-ChIP-Seq(GSE12346)/Homer
                46 of 440 (1e-2) Elk4(ETS)/Hela-Elk4-ChIP-Seq(GSE31477)/Homer
                47 of 440 (1e-2) Tbx20(T-box)/Heart-Tbx20-ChIP-Seq(GSE29636)/Homer
                48 of 440 (1e-2) Rfx6(HTH)/Min6b1-Rfx6.HA-ChIP-Seq(GSE62844)/Homer
                49 of 440 (1e-2) ETS:RUNX(ETS,Runt)/Jurkat-RUNX1-ChIP-Seq(GSE17954)/Homer
                50 of 440 (1e-2) AMYB(HTH)/Testes-AMYB-ChIP-Seq(GSE44588)/Homer
                51 of 440 (1e-2) Lhx1(Homeobox)/EmbryoCarcinoma-Lhx1-ChIP-Seq(GSE70957)/Homer
                52 of 440 (1e-2) Lhx3(Homeobox)/Neuron-Lhx3-ChIP-Seq(GSE31456)/Homer
                53 of 440 (1e-2) DLX2(Homeobox)/BasalGanglia-Dlx2-ChIP-seq(GSE124936)/Homer
                54 of 440 (1e-2) Nkx6.1(Homeobox)/Islet-Nkx6.1-ChIP-Seq(GSE40975)/Homer
                55 of 440 (1e-2) OCT4-SOX2-TCF-NANOG(POU,Homeobox,HMG)/mES-Oct4-ChIP-Seq(GSE11431)/Homer
                56 of 440 (1e-2) Sox7(HMG)/ESC-Sox7-ChIP-Seq(GSE133899)/Homer
                57 of 440 (1e-2) Sox6(HMG)/Myotubes-Sox6-ChIP-Seq(GSE32627)/Homer
                58 of 440 (1e-2) DLX1(Homeobox)/BasalGanglia-Dlx1-ChIP-seq(GSE124936)/Homer
                59 of 440 (1e-2) AP-2alpha(AP2)/Hela-AP2alpha-ChIP-Seq(GSE31477)/Homer
                60 of 440 (1e-2) GABPA(ETS)/Jurkat-GABPa-ChIP-Seq(GSE17954)/Homer
                61 of 440 (1e-2) Maz(Zf)/HepG2-Maz-ChIP-Seq(GSE31477)/Homer
        Skipping...
        Job finished - if results look good, please send beer to ..

        Cleaning up tmp files...

Find motifs under male peaks but not in female peaks.

[22]:

cc.tl.compare_motif("Homer/GBM_male", "Homer/GBM_female")

[22]:

	Motif Name	Consensus	P-value	Log P-value	q-value (Benjamini)	# of Target Sequences with Motif(of 585)	% of Target Sequences with Motif	# of Background Sequences with Motif(of 2133)	% of Background Sequences with Motif
10	Nrf2(bZIP)/Lymphoblast-Nrf2-ChIP-Seq(GSE37589)...	HTGCTGAGTCAT	1.000000e-13	-30.290	0.0000	39.0	6.67%	33.5	1.57%
11	NF-E2(bZIP)/K562-NFE2-ChIP-Seq(GSE31477)/Homer	GATGACTCAGCA	1.000000e-11	-26.520	0.0000	41.0	7.01%	41.6	1.95%
12	NFE2L2(bZIP)/HepG2-NFE2L2-ChIP-Seq(Encode)/Homer	AWWWTGCTGAGTCAT	1.000000e-11	-25.480	0.0000	48.0	8.21%	56.4	2.65%
13	Bach1(bZIP)/K562-Bach1-ChIP-Seq(GSE31477)/Homer	AWWNTGCTGAGTCAT	1.000000e-09	-21.840	0.0000	39.0	6.67%	44.3	2.07%
14	MafK(bZIP)/C2C12-MafK-ChIP-Seq(GSE36030)/Homer	GCTGASTCAGCA	1.000000e-09	-20.870	0.0000	97.0	16.58%	186.9	8.76%
15	Ets1-distal(ETS)/CD4+-PolII-ChIP-Seq(Barski_et...	MACAGGAAGT	1.000000e-05	-13.680	0.0000	116.0	19.83%	273.6	12.82%
17	Sox3(HMG)/NPC-Sox3-ChIP-Seq(GSE33059)/Homer	CCWTTGTY	1.000000e-04	-9.625	0.0016	434.0	74.19%	1425.7	66.82%
18	Nkx2.2(Homeobox)/NPC-Nkx2.2-ChIP-Seq(GSE61673)...	BTBRAGTGSN	1.000000e-04	-9.612	0.0016	419.0	71.62%	1367.5	64.09%
19	Bapx1(Homeobox)/VertebralCol-Bapx1-ChIP-Seq(GS...	TTRAGTGSYK	1.000000e-04	-9.553	0.0016	467.0	79.83%	1556.7	72.96%
20	RUNX1(Runt)/Jurkat-RUNX1-ChIP-Seq(GSE29180)/Homer	AAACCACARM	1.000000e-03	-8.988	0.0026	293.0	50.09%	906.6	42.49%
23	ZNF669(Zf)/HEK293-ZNF669.GFP-ChIP-Seq(GSE58341...	GARTGGTCATCGCCC	1.000000e-03	-8.215	0.0050	24.0	4.10%	39.7	1.86%
27	RUNX-AML(Runt)/CD4+-PolII-ChIP-Seq(Barski_et_a...	GCTGTGGTTW	1.000000e-03	-7.530	0.0086	212.0	36.24%	637.6	29.89%
29	Sox10(HMG)/SciaticNerve-Sox3-ChIP-Seq(GSE35132...	CCWTTGTYYB	1.000000e-03	-7.342	0.0095	407.0	69.57%	1347.5	63.15%
30	Sp1(Zf)/Promoter/Homer	GGCCCCGCCCCC	1.000000e-03	-7.220	0.0104	31.0	5.30%	60.7	2.85%
32	E2F3(E2F)/MEF-E2F3-ChIP-Seq(GSE71376)/Homer	BTKGGCGGGAAA	1.000000e-03	-7.074	0.0113	77.0	13.16%	195.8	9.18%
37	Sox7(HMG)/ESC-Sox7-ChIP-Seq(GSE133899)/Homer	VVRRAACAATGG	1.000000e-02	-6.569	0.0162	102.0	17.44%	278.9	13.07%
40	Brn1(POU,Homeobox)/NPC-Brn1-ChIP-Seq(GSE35496)...	TATGCWAATBAV	1.000000e-02	-6.345	0.0188	134.0	22.91%	385.4	18.06%
42	Sox4(HMG)/proB-Sox4-ChIP-Seq(GSE50066)/Homer	YCTTTGTTCC	1.000000e-02	-6.213	0.0205	255.0	43.59%	804.0	37.68%
43	ZNF317(Zf)/HEK293-ZNF317.GFP-ChIP-Seq(GSE58341...	GTCWGCTGTYYCTCT	1.000000e-02	-6.112	0.0222	29.0	4.96%	59.0	2.77%
45	OCT4-SOX2-TCF-NANOG(POU,Homeobox,HMG)/mES-Oct4...	ATTTGCATAACAATG	1.000000e-02	-5.636	0.0341	96.0	16.41%	267.1	12.52%
46	Isl1(Homeobox)/Neuron-Isl1-ChIP-Seq(GSE31456)/...	CTAATKGV	1.000000e-02	-5.578	0.0354	467.0	79.83%	1601.5	75.06%
48	Sox2(HMG)/mES-Sox2-ChIP-Seq(GSE11431)/Homer	BCCATTGTTC	1.000000e-02	-5.408	0.0402	272.0	46.50%	876.6	41.09%
49	Egr1(Zf)/K562-Egr1-ChIP-Seq(GSE32465)/Homer	TGCGTGGGYG	1.000000e-02	-5.339	0.0422	137.0	23.42%	406.2	19.04%
50	Tbx20(T-box)/Heart-Tbx20-ChIP-Seq(GSE29636)/Homer	GGTGYTGACAGS	1.000000e-02	-5.310	0.0426	67.0	11.45%	178.0	8.34%
51	Nkx3.1(Homeobox)/LNCaP-Nkx3.1-ChIP-Seq(GSE2826...	AAGCACTTAA	1.000000e-02	-5.277	0.0432	479.0	81.88%	1652.0	77.43%
52	E2F6(E2F)/Hela-E2F6-ChIP-Seq(GSE31477)/Homer	GGCGGGAARN	1.000000e-02	-5.256	0.0433	64.0	10.94%	168.7	7.91%
53	TATA-Box(TBP)/Promoter/Homer	CCTTTTAWAGSC	1.000000e-02	-5.201	0.0449	382.0	65.30%	1282.4	60.10%
55	Klf9(Zf)/GBM-Klf9-ChIP-Seq(GSE62211)/Homer	GCCACRCCCACY	1.000000e-02	-5.137	0.0462	85.0	14.53%	236.3	11.08%

Find motifs in female but not in male.

[23]:

cc.tl.compare_motif("Homer/GBM_female", "Homer/GBM_male")

[23]:

	Motif Name	Consensus	P-value	Log P-value	q-value (Benjamini)	# of Target Sequences with Motif(of 1009)	% of Target Sequences with Motif	# of Background Sequences with Motif(of 2327)	% of Background Sequences with Motif
0	Atf4(bZIP)/MEF-Atf4-ChIP-Seq(GSE35681)/Homer	MTGATGCAAT	1.000000e-14	-33.810	0.0000	228.0	22.60%	313.9	13.49%
1	Chop(bZIP)/MEF-Chop-ChIP-Seq(GSE35681)/Homer	ATTGCATCAT	1.000000e-14	-32.890	0.0000	197.0	19.52%	259.5	11.15%
2	IRF8(IRF)/BMDM-IRF8-ChIP-Seq(GSE77884)/Homer	GRAASTGAAAST	1.000000e-12	-28.550	0.0000	255.0	25.27%	381.5	16.40%
5	EBF1(EBF)/Near-E2A-ChIP-Seq(GSE21512)/Homer	GTCCCCWGGGGA	1.000000e-11	-27.250	0.0000	310.0	30.72%	495.0	21.27%
8	Zic3(Zf)/mES-Zic3-ChIP-Seq(GSE37889)/Homer	GGCCYCCTGCTGDGH	1.000000e-11	-25.440	0.0000	189.0	18.73%	266.5	11.45%
...	...	...	...	...	...	...	...	...	...
122	SPDEF(ETS)/VCaP-SPDEF-ChIP-Seq(SRA014231)/Homer	ASWTCCTGBT	1.000000e-02	-4.786	0.0299	420.0	41.63%	882.4	37.92%
123	Srebp2(bHLH)/HepG2-Srebp2-ChIP-Seq(GSE31477)/H...	CGGTCACSCCAC	1.000000e-02	-4.760	0.0304	45.0	4.46%	71.8	3.08%
124	ZNF519(Zf)/HEK293-ZNF519.GFP-ChIP-Seq(GSE58341...	GAGSCCGAGC	1.000000e-02	-4.720	0.0314	37.0	3.67%	56.1	2.41%
125	AP-2alpha(AP2)/Hela-AP2alpha-ChIP-Seq(GSE31477...	ATGCCCTGAGGC	1.000000e-02	-4.695	0.0319	167.0	16.55%	323.9	13.92%
126	Atf7(bZIP)/3T3L1-Atf7-ChIP-Seq(GSE56872)/Homer	NGRTGACGTCAY	1.000000e-02	-4.676	0.0323	192.0	19.03%	377.5	16.22%

98 rows × 9 columns

Next, we want to identify differentially bound peaks whose nearby genes are differentially expressed between males and females. To do so, we must first read in the bulk RNA-seq data. Note that this RNA-seq data is normalized by RPKM.

[24]:

rna =  cc.datasets.mouse_brd4_data(data = "RNA")
rna

[24]:

	sample.f6_dmso_3	sample.f6_dmso_2	sample.f6_dmso_1	sample.m6_dmso_3	sample.m6_dmso_2	sample.m6_dmso_1
mt-Tf	0.000000	0.639272	0.000000	1.156308	0.000000	0.687531
mt-Rnr1	127.907281	184.396768	172.544723	130.417054	192.566581	114.212201
mt-Tv	0.000000	0.000000	1.761150	0.000000	0.510777	0.000000
mt-Rnr2	247.267306	527.855824	324.345998	675.031187	684.376279	418.345603
mt-Tl1	44.418710	117.660098	70.481211	167.217607	115.129081	127.789076
...	...	...	...	...	...	...
Gm16897	0.281553	0.349020	0.463442	0.078913	0.070742	0.112610
A330023F24Rik	0.118678	0.055749	0.058441	0.029411	0.018833	0.049965
Cd46	0.127930	0.095151	0.075997	0.061467	0.055103	0.087715
Cr1l	9.993119	14.027801	13.717564	13.586101	13.240219	11.089902
Cr2	0.018832	0.056028	0.019578	0.012668	0.017034	0.015064

21430 rows × 6 columns

We now plot the heatmap for calling cards data and gene expression data.

The top lines are the relative Brd4 binding for each of the two groups. The bottom lines are the gene expressions of the top two nearest genes in the 3 RNA-seq replicates for males and females; each gene shown separately.

We can see a relationship between Brd4 binding and the expression of nearby genes. Genes that are strongly bound in females tend to be expressed in females and expressed at lower levels in males and vice versa. The effect is much more muted for the 2nd nearest gene to the peak.

[25]:

cc.pl.heatmap(adata_cc,rna, rnalabels = ["Female1", "Female2", "Female3","Male1", "Male2", "Male3"])

Please make sure that the samples in adata_cc and rna are in the same order.

/home/juanru/Desktop/pycallingcards/pycallingcards/pycallingcards/plotting/_heatmap_ccrna.py:193: RuntimeWarning: invalid value encountered in true_divide
  data[groupnumber:secondnum, :]
/home/juanru/Desktop/pycallingcards/pycallingcards/pycallingcards/plotting/_heatmap_ccrna.py:195: RuntimeWarning: invalid value encountered in true_divide
  data[secondnum:, :]

../../_images/tutorials_notebooks_Brd4_bulk_45_2.png

Find out the co-differential peak-gene pairs. Look into all the differential peaks and then see if the annotated genes are also significantly expressed. We can set the pvalue and score/log foldchange cutoff easily.

[26]:

result = cc.tl.pair_peak_gene_bulk(adata_cc, "https://github.com/The-Mitra-Lab/pycallingcards_data/releases/download/data/deseq_MF.csv",
                                   name_cc = 'logfoldchanges', pvalue_cutoff_cc = 0.001, pvalue_cutoff_rna = 0.001,
                                   lfc_cutoff_rna = 5, lfc_cutoff_cc = 5)
result

[26]:

	Peak	logfoldchanges_peak	Pvalue_peak	Pvalue_adj_peak	Gene	Score_gene	Pvalue_gene	Pvalue_adj_gene	Distance_peak_to_gene
0	chr1_174659622_174662521	5.652435	1.530047e-08	4.586591e-07	Grem2	-10.401831	1.280025e-67	1.252748e-65	171264
1	chr1_174917639_174921559	6.208699	2.711848e-12	1.562878e-10	Grem2	-10.401831	1.280025e-67	1.252748e-65	0
2	chr2_93645657_93648162	5.189615	2.360560e-06	4.324934e-05	Alx4	-10.292504	5.458739e-18	1.081356e-16	0
3	chr3_33140350_33142508	5.534632	6.455627e-08	1.724615e-06	Pex5l	-7.590236	2.351122e-24	6.841972e-23	0
4	chr3_126498429_126499612	5.337659	5.594815e-07	1.202787e-05	Arsj	-6.281259	8.667831e-27	2.783525e-25	-58056
5	chr3_126657669_126660188	-5.397757	5.546104e-07	1.202787e-05	Arsj	-6.281259	8.667831e-27	2.783525e-25	-217296
6	chr3_132085669_132092999	6.121213	1.509244e-105	2.850264e-102	Dkk2	-6.916087	9.389039e-178	5.513378e-175	0
7	chr3_132093815_132106295	5.072335	9.590057e-65	6.924867e-62	Dkk2	-6.916087	9.389039e-178	5.513378e-175	0
8	chr3_141699368_141702608	6.065565	2.403341e-29	5.413249e-27	Bmpr1b	-5.570116	6.893076e-40	3.655998e-38	134528
9	chr4_68535893_68538867	5.172571	1.974087e-15	1.648497e-13	Brinp1	-7.758307	5.384595e-10	5.726618e-09	222505
10	chr4_68545353_68551019	6.825275	1.516093e-18	1.684235e-16	Brinp1	-7.758307	5.384595e-10	5.726618e-09	210353
11	chr4_118022206_118024522	-5.332271	1.094775e-06	2.210347e-05	Artn	9.083650	8.043160e-84	1.066497e-81	-92444
12	chr5_13001869_13003460	5.957213	2.037009e-10	8.898685e-09	Sema3a	-5.771471	3.017136e-297	6.200968e-294	393324
13	chr5_13276479_13283560	7.727895	3.536779e-34	1.033708e-31	Sema3a	-5.771471	3.017136e-297	6.200968e-294	113224
14	chr5_17080302_17082934	5.024623	9.959617e-06	1.561421e-04	Hgf	-7.513756	2.543270e-09	2.523623e-08	-460864
15	chr5_28465271_28467276	-5.397757	5.546104e-07	1.202787e-05	Shh	5.565109	2.158809e-156	9.593279e-154	0
16	chr5_38081825_38086331	6.088429	2.350343e-11	1.192216e-09	Nsg1	-5.086635	2.233941e-111	4.963574e-109	50862
17	chr7_56523378_56528436	6.721266	2.699280e-17	2.609056e-15	Gabrg3	-5.960560	2.341610e-05	1.323050e-04	195806
18	chr7_111875691_111878313	5.109476	4.848742e-06	8.221096e-05	Galnt18	-5.959958	2.253335e-68	2.218523e-66	-95715
19	chr8_82329336_82331284	5.189615	2.360560e-06	4.324934e-05	Il15	-5.185933	8.096707e-04	3.267016e-03	340
20	chr8_102781569_102784308	5.761340	3.626346e-09	1.222945e-07	Cdh11	-6.729272	1.932941e-232	1.986339e-229	0
21	chr11_96341058_96343216	5.337659	5.594815e-07	1.202787e-05	Hoxb3	-6.433657	6.669080e-07	4.796719e-06	0
22	chr11_96354206_96355627	5.109476	4.848742e-06	8.221096e-05	Hoxb3	-6.433657	6.669080e-07	4.796719e-06	-6277
23	chr11_96354206_96355627	5.109476	4.848742e-06	8.221096e-05	Hoxb2	-6.351656	3.359549e-06	2.173857e-05	-193
24	chr12_56506841_56514912	-9.259195	2.083740e-95	3.009288e-92	Nkx2-1	9.134857	1.061953e-60	9.189803e-59	17023
25	chr13_117409406_117414817	-5.127606	3.397970e-15	2.771547e-13	Emb	-6.021510	3.917980e-06	2.506593e-05	-134992
26	chr14_66104432_66107303	6.455571	1.757505e-14	1.344190e-12	Ephx2	-6.872942	1.597944e-07	1.247549e-06	0
27	chr14_88460573_88463273	5.594736	3.142836e-08	8.899628e-07	Pcdh20	-6.097424	1.179837e-05	6.988068e-05	1474
28	chr14_88898125_88900211	-5.263671	2.162151e-06	4.039800e-05	Pcdh20	-6.097424	1.179837e-05	6.988068e-05	-426730
29	chr15_11692586_11694713	6.046002	4.827833e-11	2.333231e-09	Npr3	-8.356128	2.957230e-242	3.473055e-239	145183
30	chr15_11743634_11745456	6.922288	8.515240e-20	1.034939e-17	Npr3	-8.356128	2.957230e-242	3.473055e-239	94440
31	chr15_11782997_11784930	5.957213	2.037009e-10	8.898685e-09	Npr3	-8.356128	2.957230e-242	3.473055e-239	54966
32	chr18_53465649_53470088	5.761340	3.626346e-09	1.222945e-07	Prdm6	-9.029699	4.482978e-14	6.960257e-13	0
33	chr19_59159949_59161669	-6.094880	4.260481e-11	2.067175e-09	Vax1	7.636277	8.070915e-31	3.071805e-29	2816
34	chr19_59163812_59165938	-5.788022	4.812885e-09	1.603272e-07	Vax1	7.636277	8.070915e-31	3.071805e-29	0
35	chrX_93281556_93288401	5.093695	9.655833e-32	2.469379e-29	Arx	-5.380834	6.537245e-123	1.853196e-120	0
36	chrY_1009018_1011799	-5.686862	1.864054e-08	5.507147e-07	Eif2s3y	11.645099	9.414457e-23	2.468780e-21	0
37	chrY_1243715_1246316	-5.520433	1.425488e-07	3.468498e-06	Ddx3y	11.525927	2.291430e-22	5.823136e-21	14399
38	chrY_1243715_1246316	-5.520433	1.425488e-07	3.468498e-06	Uty	8.369064	1.224906e-22	3.191744e-21	0
39	chrY_1282482_1287504	-5.191646	4.272455e-06	7.350598e-05	Ddx3y	11.525927	2.291430e-22	5.823136e-21	0

Next, we visualize, for specific peaks, the magnitude of differential binding and the expression of nearby genes using the dotplot function. This function creates a dotplot with three columns (below). The first shows the relative binding of Brd4 at a chromsomal locus in males and females, the second and third columns show the expression levels of the nearest and second nearest genes in males and females.

[27]:

cc.pl.dotplot_bulk(adata_cc, rna, selected_list = list(np.unique(np.array(result["Peak"]))), num_list = [3,3], dotsize = 500,
                   cmap = 'plasma', figsize = (16, 16), sort_by_chrom = True, topspace = 0.96)

../../_images/tutorials_notebooks_Brd4_bulk_49_0.png

We can see from these results that all of the peaks visualized here display differential binding, and either the nearest, or second nearest gene has differential expression. In most cases, increased Brd4 binding is associated with an increase in gene expression.

Pycallingcards provides functionality to explore relatinoships between differentially bound regions and nearby GWAS identified SNPs. Since the data analyzed here were collected in murine cells, we must first lift over any interesting peaks to the human genome before we can look for nearby disease associated SNPs. We use liftover to get the resuls.

[28]:

result = cc.tl.result_mapping(result)
result

Start mapping the peaks to the new genome.

100%|██████████| 40/40 [00:00<00:00, 69.43it/s]

Start finding location of genes in the new genome.

100%|██████████| 40/40 [00:00<00:00, 287.78it/s]

[28]:

	Peak	logfoldchanges_peak	Pvalue_peak	Pvalue_adj_peak	Gene	Score_gene	Pvalue_gene	Pvalue_adj_gene	Distance_peak_to_gene	Chr_liftover	Start_liftover	End_liftover	Chr_hg38	Start_hg38	End_hg38
0	chr1_174659622_174662521	5.652435	1.530047e-08	4.586591e-07	Grem2	-10.401831	1.280025e-67	1.252748e-65	171264	chr1	240280756	240283835	chr1	240489572	240612162
1	chr1_174917639_174921559	6.208699	2.711848e-12	1.562878e-10	Grem2	-10.401831	1.280025e-67	1.252748e-65	0	chr1	240608636	240611870	chr1	240489572	240612162
2	chr2_93645657_93648162	5.189615	2.360560e-06	4.324934e-05	Alx4	-10.292504	5.458739e-18	1.081356e-16	0	chr11	44303696	44306415	chr11	44260727	44310166
3	chr3_33140350_33142508	5.534632	6.455627e-08	1.724615e-06	Pex5l	-7.590236	2.351122e-24	6.841972e-23	0	chr3	180033883	180036186	chr3	179794958	180037053
4	chr3_126498429_126499612	5.337659	5.594815e-07	1.202787e-05	Arsj	-6.281259	8.667831e-27	2.783525e-25	-58056	chr4	113839066	113844568	chr4	113900283	113979722
5	chr3_126657669_126660188	-5.397757	5.546104e-07	1.202787e-05	Arsj	-6.281259	8.667831e-27	2.783525e-25	-217296	chr4	113683844	113686758	chr4	113900283	113979722
6	chr3_132085669_132092999	6.121213	1.509244e-105	2.850264e-102	Dkk2	-6.916087	9.389039e-178	5.513378e-175	0	chr4	107029387	107035946	chr4	106921801	107036296
7	chr3_132093815_132106295	5.072335	9.590057e-65	6.924867e-62	Dkk2	-6.916087	9.389039e-178	5.513378e-175	0	chr4	107015456	107028826	chr4	106921801	107036296
8	chr3_141699368_141702608	6.065565	2.403341e-29	5.413249e-27	Bmpr1b	-5.570116	6.893076e-40	3.655998e-38	134528	chr4	95306647	95309431	chr4	94757976	95158450
9	chr4_68535893_68538867	5.172571	1.974087e-15	1.648497e-13	Brinp1	-7.758307	5.384595e-10	5.726618e-09	222505	chr9	118993297	118996641	chr9	119166629	119369461
10	chr4_68545353_68551019	6.825275	1.516093e-18	1.684235e-16	Brinp1	-7.758307	5.384595e-10	5.726618e-09	210353	chr9	119002823	119008509	chr9	119166629	119369461
11	chr4_118022206_118024522	-5.332271	1.094775e-06	2.210347e-05	Artn	9.083650	8.043160e-84	1.066497e-81	-92444	chr1	43824383	43825212	chr1	43933319	43937240
12	chr5_13001869_13003460	5.957213	2.037009e-10	8.898685e-09	Sema3a	-5.771471	3.017136e-297	6.200968e-294	393324				chr7	83958342	84194901
13	chr5_13276479_13283560	7.727895	3.536779e-34	1.033708e-31	Sema3a	-5.771471	3.017136e-297	6.200968e-294	113224	chr7	84314176	84318280	chr7	83958342	84194901
14	chr5_17080302_17082934	5.024623	9.959617e-06	1.561421e-04	Hgf	-7.513756	2.543270e-09	2.523623e-08	-460864	chr7	81335127	81335733	chr7	81699007	81770198
15	chr5_28465271_28467276	-5.397757	5.546104e-07	1.202787e-05	Shh	5.565109	2.158809e-156	9.593279e-154	0	chr7	155810616	155812624	chr7	155799983	155812273
16	chr5_38081825_38086331	6.088429	2.350343e-11	1.192216e-09	Nsg1	-5.086635	2.233941e-111	4.963574e-109	50862	chr4	4488485	4493470	chr4	4386255	4419058
17	chr7_56523378_56528436	6.721266	2.699280e-17	2.609056e-15	Gabrg3	-5.960560	2.341610e-05	1.323050e-04	195806				chr15	26971281	27533227
18	chr7_111875691_111878313	5.109476	4.848742e-06	8.221096e-05	Galnt18	-5.959958	2.253335e-68	2.218523e-66	-95715	chr11	11723801	11725419	chr11	11270873	11622014
19	chr8_82329336_82331284	5.189615	2.360560e-06	4.324934e-05	Il15	-5.185933	8.096707e-04	3.267016e-03	340	chr4	141733679	141736013	chr4	141636595	141733987
20	chr8_102781569_102784308	5.761340	3.626346e-09	1.222945e-07	Cdh11	-6.729272	1.932941e-232	1.986339e-229	0	chr16	65118621	65121296	chr16	64943752	65122137
21	chr11_96341058_96343216	5.337659	5.594815e-07	1.202787e-05	Hoxb3	-6.433657	6.669080e-07	4.796719e-06	0	chr17	48553538	48555622	chr17	48548869	48582622
22	chr11_96354206_96355627	5.109476	4.848742e-06	8.221096e-05	Hoxb3	-6.433657	6.669080e-07	4.796719e-06	-6277	chr17	48540790	48542479	chr17	48548869	48582622
23	chr11_96354206_96355627	5.109476	4.848742e-06	8.221096e-05	Hoxb2	-6.351656	3.359549e-06	2.173857e-05	-193	chr17	48540790	48542479	chr17	48542655	48545031
24	chr12_56506841_56514912	-9.259195	2.083740e-95	3.009288e-92	Nkx2-1	9.134857	1.061953e-60	9.189803e-59	17023	chr14	36491565	36499712	chr14	36516398	36520225
25	chr13_117409406_117414817	-5.127606	3.397970e-15	2.771547e-13	Emb	-6.021510	3.917980e-06	2.506593e-05	-134992				chr5	50396196	50441400
26	chr14_66104432_66107303	6.455571	1.757505e-14	1.344190e-12	Ephx2	-6.872942	1.597944e-07	1.247549e-06	0	chr8	27506903	27511135	chr8	27491001	27544922
27	chr14_88460573_88463273	5.594736	3.142836e-08	8.899628e-07	Pcdh20	-6.097424	1.179837e-05	6.988068e-05	1474	chr13	61406547	61407676	chr13	61409685	61415522
28	chr14_88898125_88900211	-5.263671	2.162151e-06	4.039800e-05	Pcdh20	-6.097424	1.179837e-05	6.988068e-05	-426730	chr13	61862674	61862983	chr13	61409685	61415522
29	chr15_11692586_11694713	6.046002	4.827833e-11	2.333231e-09	Npr3	-8.356128	2.957230e-242	3.473055e-239	145183	chr5	33023231	33024567	chr5	32710636	32791724
30	chr15_11743634_11745456	6.922288	8.515240e-20	1.034939e-17	Npr3	-8.356128	2.957230e-242	3.473055e-239	94440	chr5	32929090	32932829	chr5	32710636	32791724
31	chr15_11782997_11784930	5.957213	2.037009e-10	8.898685e-09	Npr3	-8.356128	2.957230e-242	3.473055e-239	54966	chr5	32864592	32866245	chr5	32710636	32791724
32	chr18_53465649_53470088	5.761340	3.626346e-09	1.222945e-07	Prdm6	-9.029699	4.482978e-14	6.960257e-13	0	chr5	123091173	123096087	chr5	123089102	123194266
33	chr19_59159949_59161669	-6.094880	4.260481e-11	2.067175e-09	Vax1	7.636277	8.070915e-31	3.071805e-29	2816	chr10	117127894	117129270	chr10	117128520	117138301
34	chr19_59163812_59165938	-5.788022	4.812885e-09	1.603272e-07	Vax1	7.636277	8.070915e-31	3.071805e-29	0	chr10	117131583	117133776	chr10	117128520	117138301
35	chrX_93281556_93288401	5.093695	9.655833e-32	2.469379e-29	Arx	-5.380834	6.537245e-123	1.853196e-120	0	chrX	25013967	25020980	chrX	25003695	25015948
36	chrY_1009018_1011799	-5.686862	1.864054e-08	5.507147e-07	Eif2s3y	11.645099	9.414457e-23	2.468780e-21	0	chrX	24054686	24057370
37	chrY_1243715_1246316	-5.520433	1.425488e-07	3.468498e-06	Ddx3y	11.525927	2.291430e-22	5.823136e-21	14399	chrY	13478375	13480067	chrY	12903998	12920478
38	chrY_1243715_1246316	-5.520433	1.425488e-07	3.468498e-06	Uty	8.369064	1.224906e-22	3.191744e-21	0	chrY	13478375	13480067	chrY	13248378	13480670
39	chrY_1282482_1287504	-5.191646	4.272455e-06	7.350598e-05	Ddx3y	11.525927	2.291430e-22	5.823136e-21	0	chrY	12903183	12909538	chrY	12903998	12920478

[29]:

GWAS_result = cc.tl.GWAS(result)
GWAS_result

[29]:

	Peak	logfoldchanges_peak	Pvalue_peak	Pvalue_adj_peak	Gene	Score_gene	Pvalue_gene	Pvalue_adj_gene	Distance_peak_to_gene	Chr_liftover	Start_liftover	End_liftover	Chr_hg38	Start_hg38	End_hg38	GWAS
0	chr1_174659622_174662521	5.652435	1.530047e-08	4.586591e-07	Grem2	-10.401831	1.280025e-67	1.252748e-65	171264	chr1	240280756	240283835	chr1	240489572	240612162	Coronary heart disease
1	chr1_174917639_174921559	6.208699	2.711848e-12	1.562878e-10	Grem2	-10.401831	1.280025e-67	1.252748e-65	0	chr1	240608636	240611870	chr1	240489572	240612162
2	chr2_93645657_93648162	5.189615	2.360560e-06	4.324934e-05	Alx4	-10.292504	5.458739e-18	1.081356e-16	0	chr11	44303696	44306415	chr11	44260727	44310166
3	chr3_33140350_33142508	5.534632	6.455627e-08	1.724615e-06	Pex5l	-7.590236	2.351122e-24	6.841972e-23	0	chr3	180033883	180036186	chr3	179794958	180037053
4	chr3_126498429_126499612	5.337659	5.594815e-07	1.202787e-05	Arsj	-6.281259	8.667831e-27	2.783525e-25	-58056	chr4	113839066	113844568	chr4	113900283	113979722
5	chr3_126657669_126660188	-5.397757	5.546104e-07	1.202787e-05	Arsj	-6.281259	8.667831e-27	2.783525e-25	-217296	chr4	113683844	113686758	chr4	113900283	113979722
6	chr3_132085669_132092999	6.121213	1.509244e-105	2.850264e-102	Dkk2	-6.916087	9.389039e-178	5.513378e-175	0	chr4	107029387	107035946	chr4	106921801	107036296
7	chr3_132093815_132106295	5.072335	9.590057e-65	6.924867e-62	Dkk2	-6.916087	9.389039e-178	5.513378e-175	0	chr4	107015456	107028826	chr4	106921801	107036296
8	chr3_141699368_141702608	6.065565	2.403341e-29	5.413249e-27	Bmpr1b	-5.570116	6.893076e-40	3.655998e-38	134528	chr4	95306647	95309431	chr4	94757976	95158450
9	chr4_68535893_68538867	5.172571	1.974087e-15	1.648497e-13	Brinp1	-7.758307	5.384595e-10	5.726618e-09	222505	chr9	118993297	118996641	chr9	119166629	119369461	Gut microbiota (bacterial taxa, hurdle binary ...
10	chr4_68545353_68551019	6.825275	1.516093e-18	1.684235e-16	Brinp1	-7.758307	5.384595e-10	5.726618e-09	210353	chr9	119002823	119008509	chr9	119166629	119369461
11	chr4_118022206_118024522	-5.332271	1.094775e-06	2.210347e-05	Artn	9.083650	8.043160e-84	1.066497e-81	-92444	chr1	43824383	43825212	chr1	43933319	43937240
12	chr5_13001869_13003460	5.957213	2.037009e-10	8.898685e-09	Sema3a	-5.771471	3.017136e-297	6.200968e-294	393324				chr7	83958342	84194901
13	chr5_13276479_13283560	7.727895	3.536779e-34	1.033708e-31	Sema3a	-5.771471	3.017136e-297	6.200968e-294	113224	chr7	84314176	84318280	chr7	83958342	84194901
14	chr5_17080302_17082934	5.024623	9.959617e-06	1.561421e-04	Hgf	-7.513756	2.543270e-09	2.523623e-08	-460864	chr7	81335127	81335733	chr7	81699007	81770198
15	chr5_28465271_28467276	-5.397757	5.546104e-07	1.202787e-05	Shh	5.565109	2.158809e-156	9.593279e-154	0	chr7	155810616	155812624	chr7	155799983	155812273	Urate levels
16	chr5_38081825_38086331	6.088429	2.350343e-11	1.192216e-09	Nsg1	-5.086635	2.233941e-111	4.963574e-109	50862	chr4	4488485	4493470	chr4	4386255	4419058
17	chr7_56523378_56528436	6.721266	2.699280e-17	2.609056e-15	Gabrg3	-5.960560	2.341610e-05	1.323050e-04	195806				chr15	26971281	27533227
18	chr7_111875691_111878313	5.109476	4.848742e-06	8.221096e-05	Galnt18	-5.959958	2.253335e-68	2.218523e-66	-95715	chr11	11723801	11725419	chr11	11270873	11622014
19	chr8_82329336_82331284	5.189615	2.360560e-06	4.324934e-05	Il15	-5.185933	8.096707e-04	3.267016e-03	340	chr4	141733679	141736013	chr4	141636595	141733987
20	chr8_102781569_102784308	5.761340	3.626346e-09	1.222945e-07	Cdh11	-6.729272	1.932941e-232	1.986339e-229	0	chr16	65118621	65121296	chr16	64943752	65122137
21	chr11_96341058_96343216	5.337659	5.594815e-07	1.202787e-05	Hoxb3	-6.433657	6.669080e-07	4.796719e-06	0	chr17	48553538	48555622	chr17	48548869	48582622	Migraine; Insomnia
22	chr11_96354206_96355627	5.109476	4.848742e-06	8.221096e-05	Hoxb3	-6.433657	6.669080e-07	4.796719e-06	-6277	chr17	48540790	48542479	chr17	48548869	48582622	Cystatin C levels
23	chr11_96354206_96355627	5.109476	4.848742e-06	8.221096e-05	Hoxb2	-6.351656	3.359549e-06	2.173857e-05	-193	chr17	48540790	48542479	chr17	48542655	48545031	Cystatin C levels
24	chr12_56506841_56514912	-9.259195	2.083740e-95	3.009288e-92	Nkx2-1	9.134857	1.061953e-60	9.189803e-59	17023	chr14	36491565	36499712	chr14	36516398	36520225	Nonsyndromic orofacial cleft x sex interaction...
25	chr13_117409406_117414817	-5.127606	3.397970e-15	2.771547e-13	Emb	-6.021510	3.917980e-06	2.506593e-05	-134992				chr5	50396196	50441400
26	chr14_66104432_66107303	6.455571	1.757505e-14	1.344190e-12	Ephx2	-6.872942	1.597944e-07	1.247549e-06	0	chr8	27506903	27511135	chr8	27491001	27544922	Childhood ear infection
27	chr14_88460573_88463273	5.594736	3.142836e-08	8.899628e-07	Pcdh20	-6.097424	1.179837e-05	6.988068e-05	1474	chr13	61406547	61407676	chr13	61409685	61415522
28	chr14_88898125_88900211	-5.263671	2.162151e-06	4.039800e-05	Pcdh20	-6.097424	1.179837e-05	6.988068e-05	-426730	chr13	61862674	61862983	chr13	61409685	61415522
29	chr15_11692586_11694713	6.046002	4.827833e-11	2.333231e-09	Npr3	-8.356128	2.957230e-242	3.473055e-239	145183	chr5	33023231	33024567	chr5	32710636	32791724
30	chr15_11743634_11745456	6.922288	8.515240e-20	1.034939e-17	Npr3	-8.356128	2.957230e-242	3.473055e-239	94440	chr5	32929090	32932829	chr5	32710636	32791724
31	chr15_11782997_11784930	5.957213	2.037009e-10	8.898685e-09	Npr3	-8.356128	2.957230e-242	3.473055e-239	54966	chr5	32864592	32866245	chr5	32710636	32791724
32	chr18_53465649_53470088	5.761340	3.626346e-09	1.222945e-07	Prdm6	-9.029699	4.482978e-14	6.960257e-13	0	chr5	123091173	123096087	chr5	123089102	123194266
33	chr19_59159949_59161669	-6.094880	4.260481e-11	2.067175e-09	Vax1	7.636277	8.070915e-31	3.071805e-29	2816	chr10	117127894	117129270	chr10	117128520	117138301
34	chr19_59163812_59165938	-5.788022	4.812885e-09	1.603272e-07	Vax1	7.636277	8.070915e-31	3.071805e-29	0	chr10	117131583	117133776	chr10	117128520	117138301	Age at first birth; Nonsyndromic cleft lip wit...
35	chrX_93281556_93288401	5.093695	9.655833e-32	2.469379e-29	Arx	-5.380834	6.537245e-123	1.853196e-120	0	chrX	25013967	25020980	chrX	25003695	25015948
36	chrY_1009018_1011799	-5.686862	1.864054e-08	5.507147e-07	Eif2s3y	11.645099	9.414457e-23	2.468780e-21	0	chrX	24054686	24057370
37	chrY_1243715_1246316	-5.520433	1.425488e-07	3.468498e-06	Ddx3y	11.525927	2.291430e-22	5.823136e-21	14399	chrY	13478375	13480067	chrY	12903998	12920478
38	chrY_1243715_1246316	-5.520433	1.425488e-07	3.468498e-06	Uty	8.369064	1.224906e-22	3.191744e-21	0	chrY	13478375	13480067	chrY	13248378	13480670
39	chrY_1282482_1287504	-5.191646	4.272455e-06	7.350598e-05	Ddx3y	11.525927	2.291430e-22	5.823136e-21	0	chrY	12903183	12909538	chrY	12903998	12920478

Save the file if needed.

[30]:

adata_cc.write("Brd4_bindings_bulk.h5ad")