Tutorial: Brd4 bindings for Male/Female mouse bulk calling cards data.#

We will analyze bulk calling cards datasets from Kfoury et al., PNAS. (2021), which can be downloaded from GEO. These experiments were designed to study differences in Brd4 binding between male and female cells in a genetically engineered cellular model of glioblastoma (GBM). These nearly-isogenic cells were derived from murine neocortical postnatal day 1 (p1) astrocytes (male and female GBM astrocytes) and were engineered to contain a combined loss of neurofibromin (NF1) and p53 function. These cells were previously shown to display sex differences in important GBM phenotypes in vivo, including proliferation, clonogenic stem-like frequency, in vivo tumorigenesis, cell cycle regulation, gene expression, and chemotherapy response, that mimic those observed in GBM patients. For more details see [Kfoury et al., PNAS. (2021)].

In this tutorial, we will call peaks, annotate these peaks, perform differential peak analysis, and pair differentially bound genomic regions with nearby genes that are also differentially expressed between the sexes. There are 668525 insertions in female qbed data and 613728 insertions in male qbed data. For bulk RNA-seq data, there are 6 samples in total: 3 female samples and 3 male samples. It creates a group by peak anndata for calling cards data. If you want to create a replicate by peak anndata only, please check Github

[1]:
import pycallingcards as cc
import numpy as np
import pandas as pd
import scanpy as sc
from matplotlib import pyplot as plt
plt.rcParams['figure.dpi'] = 150

We start by reading qbed data. In these data, each row provides information about one Brd4-directed insertion organized into columns to indicate the chromosome, start point, end point, number of reads supporting the insertion, and the orientation and barcode for each insertion. For example, in the file displayed below, the first row tells us that the first insertion occured into a TTAA on Chromosome 1, whose coordinates start location 3478112 and end at 3478115. There was 1 read supporting this insertion and the transposon is orientated so that the plus strand of the transposon is on the plus strand of the genome. The transposon barcode is TTTGTCCAA. We add another column indecating its respected group.

Use cc.rd.read_qbed(filename) to load your own qbed data.

[2]:
Female_Brd4 = cc.datasets.mouse_brd4_data(data = "Female_Brd4")
Female_Brd4['group'] = "Female_Brd4"
Female_Brd4
#load female CC data
[2]:
Chr Start End Reads Direction Barcodes group
0 chr1 3478111 3478115 1 + TTTGTCCAA Female_Brd4
1 chr1 3481743 3481747 1 + TATGTACAA Female_Brd4
2 chr1 3481957 3481961 1 - TATGTACAA Female_Brd4
3 chr1 3493161 3493165 16 + TATGTACAA Female_Brd4
4 chr1 3493702 3493706 1 + CGTTACACA Female_Brd4
... ... ... ... ... ... ... ...
1290214 chrY 90802841 90802845 1 - CGTTACACA Female_Brd4
1290215 chrY 90803579 90803583 1 - TATGTACAA Female_Brd4
1290216 chrY 90803579 90803583 1 - TGTCGTGCA Female_Brd4
1290217 chrY 90804429 90804433 1 - TATACTCTA Female_Brd4
1290218 chrY 90805130 90805134 1 - GACGGCTCC Female_Brd4

1290219 rows × 7 columns

[3]:
Male_Brd4 = cc.datasets.mouse_brd4_data(data = "Male_Brd4")
Male_Brd4['group'] = "Male_Brd4"
Male_Brd4
#load male CC data
[3]:
Chr Start End Reads Direction Barcodes group
0 chr1 3493161 3493165 1 - GAGGTACAG Male_Brd4
1 chr1 3493275 3493279 1 + TACTTTCCG Male_Brd4
2 chr1 3493341 3493345 2 + AGCACAGTG Male_Brd4
3 chr1 3493341 3493345 10 + TGTCCATTG Male_Brd4
4 chr1 3493341 3493345 8 + ATGAAAGCA Male_Brd4
... ... ... ... ... ... ... ...
1185697 chrY 90803733 90803737 13 - TACTTTCCG Male_Brd4
1185698 chrY 90803733 90803737 9 - CACGGCGGA Male_Brd4
1185699 chrY 90803733 90803737 17 - GATCTTATG Male_Brd4
1185700 chrY 90803774 90803778 4 - ATGAAAGCA Male_Brd4
1185701 chrY 90806562 90806566 1 - AGCACAGTG Male_Brd4

1185702 rows × 7 columns

Since we are interested in differential peak calling, we now combine the two datasets to call peaks on the joint dataset:

[4]:
Brd4 = cc.rd.combine_qbed([Female_Brd4, Male_Brd4])
Brd4
[4]:
Chr Start End Reads Direction Barcodes group
0 chr1 3478111 3478115 1 + TTTGTCCAA Female_Brd4
1 chr1 3481743 3481747 1 + TATGTACAA Female_Brd4
2 chr1 3481957 3481961 1 - TATGTACAA Female_Brd4
3 chr1 3493161 3493165 16 + TATGTACAA Female_Brd4
4 chr1 3493161 3493165 1 - GAGGTACAG Male_Brd4
... ... ... ... ... ... ... ...
2475916 chrY 90803733 90803737 17 - GATCTTATG Male_Brd4
2475917 chrY 90803774 90803778 4 - ATGAAAGCA Male_Brd4
2475918 chrY 90804429 90804433 1 - TATACTCTA Female_Brd4
2475919 chrY 90805130 90805134 1 - GACGGCTCC Female_Brd4
2475920 chrY 90806562 90806566 1 - AGCACAGTG Male_Brd4

2475921 rows × 7 columns

Optional: filter the insertions with reads greater than 2. Because we use strict quality control on read insertions, even insertions supported by 1 read are likely to be bona fide.

[5]:
Brd4 = Brd4[Brd4['Reads'] > 2]
Brd4
[5]:
Chr Start End Reads Direction Barcodes group
3 chr1 3493161 3493165 16 + TATGTACAA Female_Brd4
7 chr1 3493341 3493345 10 + TGTCCATTG Male_Brd4
8 chr1 3493341 3493345 8 + ATGAAAGCA Male_Brd4
10 chr1 3493643 3493647 3 + TCTACTGCC Male_Brd4
17 chr1 3514553 3514557 7 + TACTTTCCG Male_Brd4
... ... ... ... ... ... ... ...
2475905 chrY 42512546 42512550 3 - ACGCGCTGT Male_Brd4
2475914 chrY 90803733 90803737 13 - TACTTTCCG Male_Brd4
2475915 chrY 90803733 90803737 9 - CACGGCGGA Male_Brd4
2475916 chrY 90803733 90803737 17 - GATCTTATG Male_Brd4
2475917 chrY 90803774 90803778 4 - ATGAAAGCA Male_Brd4

1229664 rows × 7 columns

Because insertions are discrete, we now need to call peaks to deduce potential binding sites. Three different methods (CCcaller, MACCs, Blockify) are available along with three different species (hg38, mm10, sacCer3).

In this setting, we use CCcaller in mouse(‘mm10’) data. maxbetween is the most important parameter for CCcaller. It controls the maximum distance between two nearby insertions, or in another words, the minimum distance between two peaks. We recommend using a value between 800-1200 to set maxbetween. pvalue_cutoff is also an important parameter and a number 0.0001-0.01 is strongly advised. The setting of pseudocounts is largely influenced by library size. For the first time of trial, it can be adjusted to \(10^{6}-10^{-5} \times\) the number of insertions.

[6]:
peak_data = cc.pp.call_peaks(Brd4, method = "CCcaller", reference = "mm10", pvalue_cutoff = 0.01,
                             maxbetween = 1100, lam_win_size = 1000000, pseudocounts = 0.1,
                             record = True, save = "peak.bed")
peak_data
For the CCcaller method without background, [expdata, reference, pvalue_cutoff, lam_win_size, pseudocounts, minlen, extend, maxbetween, test_method, min_insertions, record] would be utilized.
100%|██████████| 21/21 [00:21<00:00,  1.00s/it]
[6]:
Chr Start End Experiment Insertions Reference Insertions Expected Insertions pvalue pvalue_adj
0 chr1 4196775 4198714 16 12 0.142389 0.000000e+00 0.000000e+00
1 chr1 4785068 4786551 17 15 0.572576 0.000000e+00 0.000000e+00
2 chr1 4806673 4807730 21 12 0.477937 0.000000e+00 0.000000e+00
3 chr1 4856928 4862019 104 44 1.490080 0.000000e+00 0.000000e+00
4 chr1 4912814 4916593 7 22 0.791756 1.900897e-06 1.024407e-04
... ... ... ... ... ... ... ... ...
24546 chrX 169936677 169937745 7 8 1.189526 3.479200e-05 1.673495e-03
24547 chrY 897430 899700 11 24 0.528165 6.046275e-13 5.162843e-11
24548 chrY 1009018 1011799 27 25 0.489073 0.000000e+00 0.000000e+00
24549 chrY 1243715 1246316 24 29 0.504932 0.000000e+00 0.000000e+00
24550 chrY 1282482 1287504 19 60 0.945958 0.000000e+00 0.000000e+00

24551 rows × 8 columns

In order to tune parameters for peak calling, we advise looking at the data and evaluating the validity of the called peaks. The default settings are what we recommend, but for some TFs adjacent peaks may be merged that should not be, or, alternatively, peaks that should be joined may be called separately.

Below, we plot the combined calling card data in males and females for a region in chromosome 1. The top track displays insertion locations and their read counts. Each dot is an insertion and the height is log(reads+1). The middle track plots the insertion density. The thrid track represents the reference genes and peaks. Finally, the last track represents peak calls. Below you can see that regions with high densities of insertions are accurately called as Brd4 binding sites.

[7]:
cc.pl.draw_area("chr1", 4856929, 4863861, 100000, peak_data, Brd4, "mm10", font_size = 2,
                plotsize = [1,1,5],
                figsize = (30,10), peak_line = 2, save = False, example_length = 5000)
../../_images/tutorials_notebooks_Brd4_bulk_13_0.png
[8]:
cc.pl.draw_area("chr2", 6136575, 6144043, 75000, peak_data, Brd4, "mm10", font_size = 2, plotsize = [1,1,5],
                figsize = (30,8), peak_line = 1, save = False, example_length = 5000)
../../_images/tutorials_notebooks_Brd4_bulk_14_0.png

We can also visualize our data directly in the WashU Epigenome Browser. This can be useful for overlaying your data with other published datasets. Notice that this link only valid for 24hrs, so please rerun it if you want to use it.

[9]:
qbed = {"Brd4": Brd4}
bed = {"peak": peak_data}
cc.pl.WashU_browser_url(qbed = qbed, bed = bed, genome = 'mm10')
All qbed addressed
All bed addressed
Uploading files
Please click the following link to see the data on WashU Epigenome Browser directly.
https://epigenomegateway.wustl.edu/browser/?genome=mm10&hub=https://companion.epigenomegateway.org//task/544b112245ab68df385f75da9d4bfd8b/output//datahub.json

Pycallingcards can be used to visual peak locations acorss the genome to see if the distribution of peaks is unbiased and all chromosomes are represented.

[10]:
cc.pl.whole_peaks(peak_data, reference = "mm10", linewidth = 1)
../../_images/tutorials_notebooks_Brd4_bulk_18_0.png

For differential peak calling, we recommend first combining the data and calling peaks together (and then splitting the data and looking for enrichments under the jointly called peaks). However, it is often of interest to call peaks in each sample to analyze them separately or to then merge these peak calls using pybedtools. Below is the code to do this:

import pybedtools
peak_data1 = cc.pp.call_peaks(Female_Brd4, method = "CCcaller",
                              reference = "mm10", pvalue_cutoffbg = 0.1,
                              maxbetween = 2000, pvalue_cutoffTTAA = 0.001,
                              lam_win_size = 1000000,  pseudocounts = 0.1, record = True)
peak_data2 = cc.pp.call_peaks(Male_Brd4, method = "CCcaller",
                              reference = "mm10",  pvalue_cutoffbg = 0.1,
                              maxbetween = 2000, pvalue_cutoffTTAA = 0.001,
                              lam_win_size = 1000000, pseudocounts = 0.1, record = True)
peak = cc.rd.combine_qbed([peak_data1, peak_data2])
peak = pybedtools.BedTool.from_dataframe(peak).merge().to_dataframe()
peak_data = peak.rename(columns={"chrom":"Chr", "start":"Start", "end":"End"})

In the next step, we annote each peak to annotate using bedtools and pybedtools. Make sure they are all previously installed before using.

[11]:
peak_annotation = cc.pp.annotation(peak_data, reference = "mm10")
peak_annotation = cc.pp.combine_annotation(peak_data, peak_annotation)
peak_annotation
In the bedtools method, we would use bedtools in the default path. Set bedtools path by 'bedtools_path' if needed.
[11]:
Chr Start End Experiment Insertions Reference Insertions Expected Insertions pvalue pvalue_adj Nearest Refseq1 Gene Name1 Direction1 Distance1 Nearest Refseq2 Gene Name2 Direction2 Distance2
0 chr1 4196775 4198714 16 12 0.142389 0.000000e+00 0.000000e+00 NM_001195662 Rp1 - 92132 NM_011283 Rp1 - 144793
1 chr1 4785068 4786551 17 15 0.572576 0.000000e+00 0.000000e+00 NR_033530 Mrpl15 - 0 NM_008866 Lypla1 + 21342
2 chr1 4806673 4807730 21 12 0.477937 0.000000e+00 0.000000e+00 NM_008866 Lypla1 + 163 NR_033530 Mrpl15 - -20948
3 chr1 4856928 4862019 104 44 1.490080 0.000000e+00 0.000000e+00 NM_011541 Tcea1 + 0 NM_008866 Lypla1 + -10194
4 chr1 4912814 4916593 7 22 0.791756 1.900897e-06 1.024407e-04 NM_001290372 Rgs20 - 0 NM_011541 Tcea1 + -14906
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
24546 chrX 169936677 169937745 7 8 1.189526 3.479200e-05 1.673495e-03 NM_010797 Mid1 + 0 NR_029457 G530011O06Rik - 37298
24547 chrY 897430 899700 11 24 0.528165 6.046275e-13 5.162843e-11 NM_011419 Kdm5d + 0 NM_011667 Uba1y + -53207
24548 chrY 1009018 1011799 27 25 0.489073 0.000000e+00 0.000000e+00 NM_012011 Eif2s3y + 0 NR_027507 Tspy-ps - 43965
24549 chrY 1243715 1246316 24 29 0.504932 0.000000e+00 0.000000e+00 NM_009484 Uty - 0 NM_012008 Ddx3y - 14399
24550 chrY 1282482 1287504 19 60 0.945958 0.000000e+00 0.000000e+00 NM_012008 Ddx3y - 0 NM_148943 Usp9y - 11457

24551 rows × 16 columns

Use qbed data, peak data, and barcode data to make a group by peak Anndata object.

[12]:
adata_cc = cc.pp.make_Anndata(Brd4, peak_annotation, ["Female_Brd4", "Male_Brd4"], key = 'group')
adata_cc
100%|██████████| 21/21 [00:07<00:00,  2.64it/s]
[12]:
AnnData object with n_obs × n_vars = 2 × 24551
    var: 'Chr', 'Start', 'End', 'Experiment Insertions', 'Reference Insertions', 'Expected Insertions', 'pvalue', 'pvalue_adj', 'Nearest Refseq1', 'Gene Name1', 'Direction1', 'Distance1', 'Nearest Refseq2', 'Gene Name2', 'Direction2', 'Distance2'
[13]:
adata_cc.obs
[13]:
Index
Female_Brd4
Male_Brd4
[14]:
adata_cc
[14]:
AnnData object with n_obs × n_vars = 2 × 24551
    var: 'Chr', 'Start', 'End', 'Experiment Insertions', 'Reference Insertions', 'Expected Insertions', 'pvalue', 'pvalue_adj', 'Nearest Refseq1', 'Gene Name1', 'Direction1', 'Distance1', 'Nearest Refseq2', 'Gene Name2', 'Direction2', 'Distance2'
[15]:
adata_cc = cc.tl.liftover(adata_cc)
adata_cc
100%|██████████| 24551/24551 [04:44<00:00, 86.27it/s]
[15]:
AnnData object with n_obs × n_vars = 2 × 24551
    var: 'Chr', 'Start', 'End', 'Experiment Insertions', 'Reference Insertions', 'Expected Insertions', 'pvalue', 'pvalue_adj', 'Nearest Refseq1', 'Gene Name1', 'Direction1', 'Distance1', 'Nearest Refseq2', 'Gene Name2', 'Direction2', 'Distance2', 'Chr_liftover', 'Start_liftover', 'End_liftover'

Differential peak analysis will find out the significant binding for each group. In this example, we use the Fisher’s exact test to find out.

[16]:
cc.tl.rank_peak_groups(adata_cc, 'Index', method = 'fisher_exact', key_added = 'fisher_exact')
100%|██████████| 2/2 [01:43<00:00, 51.87s/it]

Plot the results for differential peak analysis.

cc.tl.rank_peak_groups(adata_cc, 'Index', method = 'fisher_exact', key_added = 'fisher_exact',
                       rankby = 'logfoldchanges')
cc.pl.rank_peak_groups(adata_cc, key = 'fisher_exact',rankby = 'logfoldchanges')
[17]:
cc.pl.rank_peak_groups(adata_cc, key = 'fisher_exact', rankby = 'pvalues')
../../_images/tutorials_notebooks_Brd4_bulk_30_0.png

We can use Pycallingcards to look at the insertion densities at peaks that are differentially bound. In the plot below, we first plot female peaks as blue points and male as grey points, and then we plot male peaks as red and female peaks as grey. From the plots below we can see there is much more Brd4 binding at this locus in females than in males.

[18]:
cc.pl.draw_area("chr9", 91366534, 91425373, 50000, peak_data, Brd4, "mm10", adata = adata_cc, font_size=2,
                plotsize = [1,1,4], name = "Female_Brd4",key = "Index",insertionkey = "group",
                name_insertion1 = 'Female Insertions', name_density1 = 'Female Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                bins = 400, figsize = (30,12), peak_line = 6, title = "Female")
cc.pl.draw_area("chr9", 91366534, 91425373, 50000, peak_data, Brd4, "mm10",  adata = adata_cc, font_size=2,
                plotsize = [1,1,4], name = "Male_Brd4",key = "Index", insertionkey = "group",
                name_insertion1 = 'Male Insertions', name_density1 = 'Male Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                bins = 400, figsize = (30,12), peak_line = 6, color = "blue", title = "Male")
../../_images/tutorials_notebooks_Brd4_bulk_32_0.png
../../_images/tutorials_notebooks_Brd4_bulk_32_1.png
[19]:
cc.pl.draw_area("chr12", 56516453, 56538107, 50000, peak_data, Brd4, "mm10", adata = adata_cc, font_size=2,
                plotsize = [1,1,3], name = "Female_Brd4", key = "Index",insertionkey = "group", bins = 200,
                name_insertion1 = 'Female Insertions', name_density1 = 'Female Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                figsize = (30,8), peak_line = 2, title = "Female")
cc.pl.draw_area("chr12", 56516453, 56538107, 50000, peak_data, Brd4, "mm10", adata = adata_cc, font_size=2,
                plotsize = [1,1,3], name = "Male_Brd4", key = "Index", insertionkey = "group", bins = 200,
                name_insertion1 = 'Male Insertions', name_density1 = 'Male Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                figsize = (30,8), peak_line = 2, color = "blue", title = "Male")
../../_images/tutorials_notebooks_Brd4_bulk_33_0.png
../../_images/tutorials_notebooks_Brd4_bulk_33_1.png

We can now plot a volcano plot to visualize the distribution of differentially bound peaks. labelright and labellest are the location points of the labels.

[20]:
cc.pl.volcano_plot(adata_cc, pvalue_name = 'pvalues_adj', pvalue_cutoff = 0.01, lfc_cutoff = 4,
                   figsize = (5,4), labelright = (3,250), labelleft = (-9,250))
/home/juanru/Desktop/pycallingcards/pycallingcards/pycallingcards/plotting/_volcano.py:70: RuntimeWarning: divide by zero encountered in log10
  pva = -np.log10(np.array(adata_cc.uns[name][pvalue_name].tolist())[:, 1])
../../_images/tutorials_notebooks_Brd4_bulk_35_1.png

We can now use HOMER to call motifs for both female and male bound peaks separately to find out potential motif binding difference.

[31]:
cc.tl.call_motif(peaks_frame=cc.tl.rank_peak_groups_df(adata_cc, key = 'fisher_exact', pval_cutoff = 0.01, logfc_min = 3, group = ['Female_Brd4'])["names"].str.split("_", expand=True),
                 reference ="mm10",save_homer = "Homer/GBM_female",
                 homer_path = "/home/juanru/miniconda3/bin/", num_cores=12)
cc.tl.call_motif(peaks_frame=cc.tl.rank_peak_groups_df(adata_cc, key = 'fisher_exact', pval_cutoff = 0.01, logfc_min = 3, group = ['Male_Brd4'])["names"].str.split("_", expand=True),
                 reference ="mm10",save_homer = "Homer/GBM_male",
                 homer_path = "/home/juanru/miniconda3/bin/", num_cores=12)
There is no save_name, it will save to temp_Homer_trial.bed and then delete.

        Position file = temp_Homer_trial.bed
        Genome = mm10
        Output Directory = Homer/GBM_female
        Fragment size set to 1000
        Using 12 CPUs
        Will not run homer for de novo motifs
        Found mset for "mouse", will check against vertebrates motifs
        Peak/BED file conversion summary:
                BED/Header formatted lines: 1009
                peakfile formatted lines: 0

        Peak File Statistics:
                Total Peaks: 1009
                Redundant Peak IDs: 0
                Peaks lacking information: 0 (need at least 5 columns per peak)
                Peaks with misformatted coordinates: 0 (should be integer)
                Peaks with misformatted strand: 0 (should be either +/- or 0/1)

        Peak file looks good!

        Background files for 1000 bp fragments found.
        Custom genome sequence directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//

        Extracting sequences from directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//
        Extracting 96 sequences from chr1
        Extracting 93 sequences from chr2
        Extracting 62 sequences from chr3
        Extracting 39 sequences from chr4
        Extracting 67 sequences from chr5
        Extracting 66 sequences from chr6
        Extracting 62 sequences from chr7
        Extracting 45 sequences from chr8
        Extracting 59 sequences from chr9
        Extracting 50 sequences from chr10
        Extracting 41 sequences from chr11
        Extracting 41 sequences from chr12
        Extracting 46 sequences from chr13
        Extracting 25 sequences from chr14
        Extracting 26 sequences from chr15
        Extracting 40 sequences from chr16
        Extracting 40 sequences from chr17
        Extracting 32 sequences from chr18
        Extracting 32 sequences from chr19
        Extracting 47 sequences from chrX

        Not removing redundant sequences


        Sequences processed:
                Auto detected maximum sequence length of 1001 bp
                1009 total

        Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8
        Freq    Bin     Count
        0.3     2       14
        0.35    3       101
        0.4     4       317
        0.45    5       309
        0.5     6       201
        0.6     7       66
        0.7     8       1

        Total sequences set to 50000

        Choosing background that matches in CpG/GC content...
        Bin     # Targets       # Background    Background Weight
        2       14      680     1.000
        3       101     4904    1.000
        4       317     15392   1.000
        5       309     15003   1.000
        6       201     9759    1.000
        7       66      3205    1.000
        8       1       49      0.991
        Assembling sequence file...
        Normalizing lower order oligos using homer2

        Reading input files...
        50001 total sequences read
        Autonormalization: 1-mers (4 total)
                A       29.26%  29.21%  1.001
                C       20.74%  20.79%  0.998
                G       20.74%  20.79%  0.998
                T       29.26%  29.21%  1.001
        Autonormalization: 2-mers (16 total)
                AA      9.53%   9.22%   1.034
                CA      7.34%   7.44%   0.987
                GA      6.15%   6.15%   1.001
                TA      6.23%   6.41%   0.972
                AC      5.27%   5.38%   0.980
                CC      5.08%   5.17%   0.983
                GC      4.24%   4.08%   1.037
                TC      6.15%   6.15%   1.001
                AG      7.36%   7.34%   1.003
                CG      0.96%   0.84%   1.148
                GG      5.08%   5.17%   0.983
                TG      7.34%   7.44%   0.987
                AT      7.09%   7.28%   0.974
                CT      7.36%   7.34%   1.003
                GT      5.27%   5.38%   0.980
                TT      9.53%   9.22%   1.034
        Autonormalization: 3-mers (64 total)
        Normalization weights can be found in file: Homer/GBM_female/seq.autonorm.tsv
        Converging on autonormalization solution:
        ...............................................................................
        Final normalization:    Autonormalization: 1-mers (4 total)
                A       29.26%  29.24%  1.001
                C       20.74%  20.76%  0.999
                G       20.74%  20.76%  0.999
                T       29.26%  29.24%  1.001
        Autonormalization: 2-mers (16 total)
                AA      9.53%   9.41%   1.013
                CA      7.34%   7.35%   0.998
                GA      6.15%   6.14%   1.001
                TA      6.23%   6.33%   0.985
                AC      5.27%   5.32%   0.991
                CC      5.08%   5.14%   0.989
                GC      4.24%   4.16%   1.018
                TC      6.15%   6.14%   1.001
                AG      7.36%   7.34%   1.002
                CG      0.96%   0.93%   1.033
                GG      5.08%   5.14%   0.989
                TG      7.34%   7.35%   0.998
                AT      7.09%   7.16%   0.990
                CT      7.36%   7.34%   1.002
                GT      5.27%   5.32%   0.991
                TT      9.53%   9.41%   1.013
        Autonormalization: 3-mers (64 total)
        Finished preparing sequence/group files

        ----------------------------------------------------------
        Known motif enrichment

        Reading input files...
        50001 total sequences read
        440 motifs loaded
        Cache length = 11180
        Using binomial scoring
        Checking enrichment of 440 motif(s)
        |0%                                    50%                                  100%|
        =================================================================================
        Preparing HTML output with sequence logos...
                1 of 440 (1e-18) Chop(bZIP)/MEF-Chop-ChIP-Seq(GSE35681)/Homer
                2 of 440 (1e-17) Atf4(bZIP)/MEF-Atf4-ChIP-Seq(GSE35681)/Homer
                3 of 440 (1e-17) ISRE(IRF)/ThioMac-LPS-Expression(GSE23622)/Homer
                4 of 440 (1e-15) Atf3(bZIP)/GBM-ATF3-ChIP-Seq(GSE33912)/Homer
                5 of 440 (1e-15) Fos(bZIP)/TSC-Fos-ChIP-Seq(GSE110950)/Homer
                6 of 440 (1e-15) AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
                7 of 440 (1e-15) JunB(bZIP)/DendriticCells-Junb-ChIP-Seq(GSE36099)/Homer
                8 of 440 (1e-14) Fra1(bZIP)/BT549-Fra1-ChIP-Seq(GSE46166)/Homer
                9 of 440 (1e-13) BATF(bZIP)/Th17-BATF-ChIP-Seq(GSE39756)/Homer
                10 of 440 (1e-13) IRF2(IRF)/Erythroblas-IRF2-ChIP-Seq(GSE36985)/Homer
                11 of 440 (1e-12) IRF3(IRF)/BMDM-Irf3-ChIP-Seq(GSE67343)/Homer
                12 of 440 (1e-12) IRF8(IRF)/BMDM-IRF8-ChIP-Seq(GSE77884)/Homer
                13 of 440 (1e-11) Fra2(bZIP)/Striatum-Fra2-ChIP-Seq(GSE43429)/Homer
                14 of 440 (1e-11) Fosl2(bZIP)/3T3L1-Fosl2-ChIP-Seq(GSE56872)/Homer
                15 of 440 (1e-10) Jun-AP1(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer
                16 of 440 (1e-9) IRF1(IRF)/PBMC-IRF1-ChIP-Seq(GSE43036)/Homer
                17 of 440 (1e-7) CEBP:AP1(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer
                18 of 440 (1e-7) Rbpj1(?)/Panc1-Rbpj1-ChIP-Seq(GSE47459)/Homer
                19 of 440 (1e-7) TEAD1(TEAD)/HepG2-TEAD1-ChIP-Seq(Encode)/Homer
                20 of 440 (1e-6) PU.1:IRF8(ETS:IRF)/pDC-Irf8-ChIP-Seq(GSE66899)/Homer
                21 of 440 (1e-6) GABPA(ETS)/Jurkat-GABPa-ChIP-Seq(GSE17954)/Homer
                22 of 440 (1e-6) Zic3(Zf)/mES-Zic3-ChIP-Seq(GSE37889)/Homer
                23 of 440 (1e-5) Unknown-ESC-element(?)/mES-Nanog-ChIP-Seq(GSE11724)/Homer
                24 of 440 (1e-4) ETS1(ETS)/Jurkat-ETS1-ChIP-Seq(GSE17954)/Homer
                25 of 440 (1e-4) NF1-halfsite(CTF)/LNCaP-NF1-ChIP-Seq(Unpublished)/Homer
                26 of 440 (1e-4) Mef2c(MADS)/GM12878-Mef2c-ChIP-Seq(GSE32465)/Homer
                27 of 440 (1e-4) TEAD(TEA)/Fibroblast-PU.1-ChIP-Seq(Unpublished)/Homer
                28 of 440 (1e-4) Isl1(Homeobox)/Neuron-Isl1-ChIP-Seq(GSE31456)/Homer
                29 of 440 (1e-4) STAT5(Stat)/mCD4+-Stat5-ChIP-Seq(GSE12346)/Homer
                30 of 440 (1e-4) TEAD3(TEA)/HepG2-TEAD3-ChIP-Seq(Encode)/Homer
                31 of 440 (1e-4) CEBP:CEBP(bZIP)/MEF-Chop-ChIP-Seq(GSE35681)/Homer
                32 of 440 (1e-4) TEAD4(TEA)/Tropoblast-Tead4-ChIP-Seq(GSE37350)/Homer
                33 of 440 (1e-4) Pitx1:Ebox(Homeobox,bHLH)/Hindlimb-Pitx1-ChIP-Seq(GSE41591)/Homer
                34 of 440 (1e-4) PAX6(Paired,Homeobox)/Forebrain-Pax6-ChIP-Seq(GSE66961)/Homer
                35 of 440 (1e-4) Smad4(MAD)/ESC-SMAD4-ChIP-Seq(GSE29422)/Homer
                36 of 440 (1e-4) Dlx3(Homeobox)/Kerainocytes-Dlx3-ChIP-Seq(GSE89884)/Homer
                37 of 440 (1e-4) Fli1(ETS)/CD8-FLI-ChIP-Seq(GSE20898)/Homer
                38 of 440 (1e-4) CREB5(bZIP)/LNCaP-CREB5.V5-ChIP-Seq(GSE137775)/Homer
                39 of 440 (1e-3) EWS:FLI1-fusion(ETS)/SK_N_MC-EWS:FLI1-ChIP-Seq(SRA014231)/Homer
                40 of 440 (1e-3) Bach2(bZIP)/OCILy7-Bach2-ChIP-Seq(GSE44420)/Homer
                41 of 440 (1e-3) STAT4(Stat)/CD4-Stat4-ChIP-Seq(GSE22104)/Homer
                42 of 440 (1e-3) IRF4(IRF)/GM12878-IRF4-ChIP-Seq(GSE32465)/Homer
                43 of 440 (1e-3) Etv2(ETS)/ES-ER71-ChIP-Seq(GSE59402)/Homer
                44 of 440 (1e-3) HLF(bZIP)/HSC-HLF.Flag-ChIP-Seq(GSE69817)/Homer
                45 of 440 (1e-3) c-Jun-CRE(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer
                46 of 440 (1e-3) AMYB(HTH)/Testes-AMYB-ChIP-Seq(GSE44588)/Homer
                47 of 440 (1e-3) EWS:ERG-fusion(ETS)/CADO_ES1-EWS:ERG-ChIP-Seq(SRA014231)/Homer
                48 of 440 (1e-3) TEAD2(TEA)/Py2T-Tead2-ChIP-Seq(GSE55709)/Homer
                49 of 440 (1e-3) NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer
                50 of 440 (1e-3) ETV1(ETS)/GIST48-ETV1-ChIP-Seq(GSE22441)/Homer
                51 of 440 (1e-3) Zic(Zf)/Cerebellum-ZIC1.2-ChIP-Seq(GSE60731)/Homer
                52 of 440 (1e-3) Mef2a(MADS)/HL1-Mef2a.biotin-ChIP-Seq(GSE21529)/Homer
                53 of 440 (1e-3) Elf4(ETS)/BMDM-Elf4-ChIP-Seq(GSE88699)/Homer
                54 of 440 (1e-3) CArG(MADS)/PUER-Srf-ChIP-Seq(Sullivan_et_al.)/Homer
                55 of 440 (1e-3) Elk1(ETS)/Hela-Elk1-ChIP-Seq(GSE31477)/Homer
                56 of 440 (1e-2) MafA(bZIP)/Islet-MafA-ChIP-Seq(GSE30298)/Homer
                57 of 440 (1e-2) MYB(HTH)/ERMYB-Myb-ChIPSeq(GSE22095)/Homer
                58 of 440 (1e-2) PU.1-IRF(ETS:IRF)/Bcell-PU.1-ChIP-Seq(GSE21512)/Homer
                59 of 440 (1e-2) EBF1(EBF)/Near-E2A-ChIP-Seq(GSE21512)/Homer
                60 of 440 (1e-2) DLX5(Homeobox)/BasalGanglia-Dlx5-ChIP-seq(GSE124936)/Homer
                61 of 440 (1e-2) Zic2(Zf)/ESC-Zic2-ChIP-Seq(SRP197560)/Homer
                62 of 440 (1e-2) DLX1(Homeobox)/BasalGanglia-Dlx1-ChIP-seq(GSE124936)/Homer
                63 of 440 (1e-2) Six2(Homeobox)/NephronProgenitor-Six2-ChIP-Seq(GSE39837)/Homer
                64 of 440 (1e-2) STAT6(Stat)/Macrophage-Stat6-ChIP-Seq(GSE38377)/Homer
                65 of 440 (1e-2) NFAT:AP1(RHD,bZIP)/Jurkat-NFATC1-ChIP-Seq(Jolma_et_al.)/Homer
                66 of 440 (1e-2) Fox:Ebox(Forkhead,bHLH)/Panc1-Foxa2-ChIP-Seq(GSE47459)/Homer
                67 of 440 (1e-2) Stat3(Stat)/mES-Stat3-ChIP-Seq(GSE11431)/Homer
                68 of 440 (1e-2) NFkB-p65-Rel(RHD)/ThioMac-LPS-Expression(GSE23622)/Homer
                69 of 440 (1e-2) EHF(ETS)/LoVo-EHF-ChIP-Seq(GSE49402)/Homer
                70 of 440 (1e-2) Hoxd11(Homeobox)/ChickenMSG-Hoxd11.Flag-ChIP-Seq(GSE86088)/Homer
                71 of 440 (1e-2) IRF:BATF(IRF:bZIP)/pDC-Irf8-ChIP-Seq(GSE66899)/Homer
                72 of 440 (1e-2) ETV4(ETS)/HepG2-ETV4-ChIP-Seq(ENCODE)/Homer
                73 of 440 (1e-2) Smad2(MAD)/ES-SMAD2-ChIP-Seq(GSE29422)/Homer
                74 of 440 (1e-2) STAT6(Stat)/CD4-Stat6-ChIP-Seq(GSE22104)/Homer
                75 of 440 (1e-2) Prop1(Homeobox)/GHFT1-PROP1.biotin-ChIP-Seq(GSE77302)/Homer
                76 of 440 (1e-2) Brn2(POU,Homeobox)/NPC-Brn2-ChIP-Seq(GSE35496)/Homer
                77 of 440 (1e-2) NFkB-p65(RHD)/GM12787-p65-ChIP-Seq(GSE19485)/Homer
                78 of 440 (1e-2) ETS:E-box(ETS,bHLH)/HPC7-Scl-ChIP-Seq(GSE22178)/Homer
                79 of 440 (1e-2) PU.1(ETS)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
                80 of 440 (1e-2) Nkx6.1(Homeobox)/Islet-Nkx6.1-ChIP-Seq(GSE40975)/Homer
                81 of 440 (1e-2) Mef2d(MADS)/Retina-Mef2d-ChIP-Seq(GSE61391)/Homer
        Skipping...
        Job finished - if results look good, please send beer to ..

        Cleaning up tmp files...


        Position file = temp_Homer_trial.bed
        Genome = mm10
        Output Directory = Homer/GBM_male
        Fragment size set to 1000
        Using 12 CPUs
        Will not run homer for de novo motifs
        Found mset for "mouse", will check against vertebrates motifs
        Peak/BED file conversion summary:
                BED/Header formatted lines: 585
                peakfile formatted lines: 0

        Peak File Statistics:
                Total Peaks: 585
                Redundant Peak IDs: 0
                Peaks lacking information: 0 (need at least 5 columns per peak)
                Peaks with misformatted coordinates: 0 (should be integer)
                Peaks with misformatted strand: 0 (should be either +/- or 0/1)

        Peak file looks good!

Finished!
There is no save_name, it will save to temp_Homer_trial.bed and then delete.
        Background files for 1000 bp fragments found.
        Custom genome sequence directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//

        Extracting sequences from directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//
        Extracting 36 sequences from chr1
        Extracting 40 sequences from chr2
        Extracting 33 sequences from chr3
        Extracting 20 sequences from chr4
        Extracting 28 sequences from chr5
        Extracting 35 sequences from chr6
        Extracting 18 sequences from chr7
        Extracting 25 sequences from chr8
        Extracting 21 sequences from chr9
        Extracting 52 sequences from chr10
        Extracting 39 sequences from chr11
        Extracting 26 sequences from chr12
        Extracting 23 sequences from chr13
        Extracting 36 sequences from chr14
        Extracting 43 sequences from chr15
        Extracting 18 sequences from chr16
        Extracting 17 sequences from chr17
        Extracting 28 sequences from chr18
        Extracting 22 sequences from chr19
        Extracting 21 sequences from chrX
        Extracting 4 sequences from chrY

        Not removing redundant sequences


        Sequences processed:
                Auto detected maximum sequence length of 1001 bp
                585 total

        Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8
        Freq    Bin     Count
        0.3     2       2
        0.35    3       41
        0.4     4       134
        0.45    5       215
        0.5     6       133
        0.6     7       56
        0.7     8       4

        Total sequences set to 50000

        Choosing background that matches in CpG/GC content...
        Bin     # Targets       # Background    Background Weight
        2       2       169     1.000
        3       41      3463    1.000
        4       134     11319   1.000
        5       215     18161   1.000
        6       133     11235   1.000
        7       56      4730    1.000
        8       4       338     1.000
        Assembling sequence file...
        Normalizing lower order oligos using homer2

        Reading input files...
        50000 total sequences read
        Autonormalization: 1-mers (4 total)
                A       28.57%  28.51%  1.002
                C       21.43%  21.49%  0.997
                G       21.43%  21.49%  0.997
                T       28.57%  28.51%  1.002
        Autonormalization: 2-mers (16 total)
                AA      9.11%   8.75%   1.041
                CA      7.48%   7.54%   0.993
                GA      6.12%   6.21%   0.985
                TA      5.87%   6.02%   0.974
                AC      5.39%   5.41%   0.995
                CC      5.41%   5.51%   0.982
                GC      4.51%   4.35%   1.035
                TC      6.12%   6.21%   0.985
                AG      7.40%   7.49%   0.988
                CG      1.13%   0.95%   1.196
                GG      5.41%   5.51%   0.982
                TG      7.48%   7.54%   0.993
                AT      6.67%   6.86%   0.973
                CT      7.40%   7.49%   0.988
                GT      5.39%   5.41%   0.995
                TT      9.11%   8.75%   1.041
        Autonormalization: 3-mers (64 total)
        Normalization weights can be found in file: Homer/GBM_male/seq.autonorm.tsv
        Converging on autonormalization solution:
        ...............................................................................
        Final normalization:    Autonormalization: 1-mers (4 total)
                A       28.57%  28.55%  1.001
                C       21.43%  21.45%  0.999
                G       21.43%  21.45%  0.999
                T       28.57%  28.55%  1.001
        Autonormalization: 2-mers (16 total)
                AA      9.11%   8.97%   1.016
                CA      7.48%   7.47%   1.001
                GA      6.12%   6.15%   0.995
                TA      5.87%   5.97%   0.983
                AC      5.39%   5.40%   0.998
                CC      5.41%   5.46%   0.992
                GC      4.51%   4.44%   1.014
                TC      6.12%   6.15%   0.995
                AG      7.40%   7.42%   0.997
                CG      1.13%   1.10%   1.033
                GG      5.41%   5.46%   0.992
                TG      7.48%   7.47%   1.001
                AT      6.67%   6.77%   0.986
                CT      7.40%   7.42%   0.997
                GT      5.39%   5.40%   0.998
                TT      9.11%   8.97%   1.016
        Autonormalization: 3-mers (64 total)
        Finished preparing sequence/group files

        ----------------------------------------------------------
        Known motif enrichment

        Reading input files...
        50000 total sequences read
        440 motifs loaded
        Cache length = 11180
        Using binomial scoring
        Checking enrichment of 440 motif(s)
        |0%                                    50%                                  100%|
        =================================================================================
Finished!

        Preparing HTML output with sequence logos...
                1 of 440 (1e-42) Fosl2(bZIP)/3T3L1-Fosl2-ChIP-Seq(GSE56872)/Homer
                2 of 440 (1e-41) Jun-AP1(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer
                3 of 440 (1e-39) Fos(bZIP)/TSC-Fos-ChIP-Seq(GSE110950)/Homer
                4 of 440 (1e-38) JunB(bZIP)/DendriticCells-Junb-ChIP-Seq(GSE36099)/Homer
                5 of 440 (1e-37) Fra1(bZIP)/BT549-Fra1-ChIP-Seq(GSE46166)/Homer
                6 of 440 (1e-35) Fra2(bZIP)/Striatum-Fra2-ChIP-Seq(GSE43429)/Homer
                7 of 440 (1e-35) BATF(bZIP)/Th17-BATF-ChIP-Seq(GSE39756)/Homer
                8 of 440 (1e-34) Atf3(bZIP)/GBM-ATF3-ChIP-Seq(GSE33912)/Homer
                9 of 440 (1e-34) AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer
                10 of 440 (1e-22) Bach2(bZIP)/OCILy7-Bach2-ChIP-Seq(GSE44420)/Homer
                11 of 440 (1e-10) Nrf2(bZIP)/Lymphoblast-Nrf2-ChIP-Seq(GSE37589)/Homer
                12 of 440 (1e-9) NF-E2(bZIP)/K562-NFE2-ChIP-Seq(GSE31477)/Homer
                13 of 440 (1e-7) Bach1(bZIP)/K562-Bach1-ChIP-Seq(GSE31477)/Homer
                14 of 440 (1e-7) NFE2L2(bZIP)/HepG2-NFE2L2-ChIP-Seq(Encode)/Homer
                15 of 440 (1e-6) MafK(bZIP)/C2C12-MafK-ChIP-Seq(GSE36030)/Homer
                16 of 440 (1e-5) Ets1-distal(ETS)/CD4+-PolII-ChIP-Seq(Barski_et_al.)/Homer
                17 of 440 (1e-4) Isl1(Homeobox)/Neuron-Isl1-ChIP-Seq(GSE31456)/Homer
                18 of 440 (1e-4) Sox3(HMG)/NPC-Sox3-ChIP-Seq(GSE33059)/Homer
                19 of 440 (1e-3) Bapx1(Homeobox)/VertebralCol-Bapx1-ChIP-Seq(GSE36672)/Homer
                20 of 440 (1e-3) Fli1(ETS)/CD8-FLI-ChIP-Seq(GSE20898)/Homer
                21 of 440 (1e-3) MafA(bZIP)/Islet-MafA-ChIP-Seq(GSE30298)/Homer
                22 of 440 (1e-3) Fox:Ebox(Forkhead,bHLH)/Panc1-Foxa2-ChIP-Seq(GSE47459)/Homer
                23 of 440 (1e-3) Nkx2.2(Homeobox)/NPC-Nkx2.2-ChIP-Seq(GSE61673)/Homer
                24 of 440 (1e-3) Sox10(HMG)/SciaticNerve-Sox3-ChIP-Seq(GSE35132)/Homer
                25 of 440 (1e-3) LHX9(Homeobox)/Hct116-LHX9.V5-ChIP-Seq(GSE116822)/Homer
                26 of 440 (1e-3) TATA-Box(TBP)/Promoter/Homer
                27 of 440 (1e-3) KLF14(Zf)/HEK293-KLF14.GFP-ChIP-Seq(GSE58341)/Homer
                28 of 440 (1e-2) ZNF669(Zf)/HEK293-ZNF669.GFP-ChIP-Seq(GSE58341)/Homer
                29 of 440 (1e-2) Lhx2(Homeobox)/HFSC-Lhx2-ChIP-Seq(GSE48068)/Homer
                30 of 440 (1e-2) Sox4(HMG)/proB-Sox4-ChIP-Seq(GSE50066)/Homer
                31 of 440 (1e-2) DLX5(Homeobox)/BasalGanglia-Dlx5-ChIP-seq(GSE124936)/Homer
                32 of 440 (1e-2) RUNX-AML(Runt)/CD4+-PolII-ChIP-Seq(Barski_et_al.)/Homer
                33 of 440 (1e-2) Sp5(Zf)/mES-Sp5.Flag-ChIP-Seq(GSE72989)/Homer
                34 of 440 (1e-2) EWS:ERG-fusion(ETS)/CADO_ES1-EWS:ERG-ChIP-Seq(SRA014231)/Homer
                35 of 440 (1e-2) RUNX1(Runt)/Jurkat-RUNX1-ChIP-Seq(GSE29180)/Homer
                36 of 440 (1e-2) Etv2(ETS)/ES-ER71-ChIP-Seq(GSE59402)/Homer
                37 of 440 (1e-2) ETV4(ETS)/HepG2-ETV4-ChIP-Seq(ENCODE)/Homer
                38 of 440 (1e-2) Sox2(HMG)/mES-Sox2-ChIP-Seq(GSE11431)/Homer
                39 of 440 (1e-2) ETS1(ETS)/Jurkat-ETS1-ChIP-Seq(GSE17954)/Homer
                40 of 440 (1e-2) Brn1(POU,Homeobox)/NPC-Brn1-ChIP-Seq(GSE35496)/Homer
                41 of 440 (1e-2) CHR(?)/Hela-CellCycle-Expression/Homer
                42 of 440 (1e-2) Dlx3(Homeobox)/Kerainocytes-Dlx3-ChIP-Seq(GSE89884)/Homer
                43 of 440 (1e-2) EWS:FLI1-fusion(ETS)/SK_N_MC-EWS:FLI1-ChIP-Seq(SRA014231)/Homer
                44 of 440 (1e-2) KLF3(Zf)/MEF-Klf3-ChIP-Seq(GSE44748)/Homer
                45 of 440 (1e-2) STAT5(Stat)/mCD4+-Stat5-ChIP-Seq(GSE12346)/Homer
                46 of 440 (1e-2) Elk4(ETS)/Hela-Elk4-ChIP-Seq(GSE31477)/Homer
                47 of 440 (1e-2) Tbx20(T-box)/Heart-Tbx20-ChIP-Seq(GSE29636)/Homer
                48 of 440 (1e-2) Rfx6(HTH)/Min6b1-Rfx6.HA-ChIP-Seq(GSE62844)/Homer
                49 of 440 (1e-2) ETS:RUNX(ETS,Runt)/Jurkat-RUNX1-ChIP-Seq(GSE17954)/Homer
                50 of 440 (1e-2) AMYB(HTH)/Testes-AMYB-ChIP-Seq(GSE44588)/Homer
                51 of 440 (1e-2) Lhx1(Homeobox)/EmbryoCarcinoma-Lhx1-ChIP-Seq(GSE70957)/Homer
                52 of 440 (1e-2) Lhx3(Homeobox)/Neuron-Lhx3-ChIP-Seq(GSE31456)/Homer
                53 of 440 (1e-2) DLX2(Homeobox)/BasalGanglia-Dlx2-ChIP-seq(GSE124936)/Homer
                54 of 440 (1e-2) Nkx6.1(Homeobox)/Islet-Nkx6.1-ChIP-Seq(GSE40975)/Homer
                55 of 440 (1e-2) OCT4-SOX2-TCF-NANOG(POU,Homeobox,HMG)/mES-Oct4-ChIP-Seq(GSE11431)/Homer
                56 of 440 (1e-2) Sox7(HMG)/ESC-Sox7-ChIP-Seq(GSE133899)/Homer
                57 of 440 (1e-2) Sox6(HMG)/Myotubes-Sox6-ChIP-Seq(GSE32627)/Homer
                58 of 440 (1e-2) DLX1(Homeobox)/BasalGanglia-Dlx1-ChIP-seq(GSE124936)/Homer
                59 of 440 (1e-2) AP-2alpha(AP2)/Hela-AP2alpha-ChIP-Seq(GSE31477)/Homer
                60 of 440 (1e-2) GABPA(ETS)/Jurkat-GABPa-ChIP-Seq(GSE17954)/Homer
                61 of 440 (1e-2) Maz(Zf)/HepG2-Maz-ChIP-Seq(GSE31477)/Homer
        Skipping...
        Job finished - if results look good, please send beer to ..

        Cleaning up tmp files...

Find motifs under male peaks but not in female peaks.

[22]:
cc.tl.compare_motif("Homer/GBM_male", "Homer/GBM_female")
[22]:
Motif Name Consensus P-value Log P-value q-value (Benjamini) # of Target Sequences with Motif(of 585) % of Target Sequences with Motif # of Background Sequences with Motif(of 2133) % of Background Sequences with Motif
10 Nrf2(bZIP)/Lymphoblast-Nrf2-ChIP-Seq(GSE37589)... HTGCTGAGTCAT 1.000000e-13 -30.290 0.0000 39.0 6.67% 33.5 1.57%
11 NF-E2(bZIP)/K562-NFE2-ChIP-Seq(GSE31477)/Homer GATGACTCAGCA 1.000000e-11 -26.520 0.0000 41.0 7.01% 41.6 1.95%
12 NFE2L2(bZIP)/HepG2-NFE2L2-ChIP-Seq(Encode)/Homer AWWWTGCTGAGTCAT 1.000000e-11 -25.480 0.0000 48.0 8.21% 56.4 2.65%
13 Bach1(bZIP)/K562-Bach1-ChIP-Seq(GSE31477)/Homer AWWNTGCTGAGTCAT 1.000000e-09 -21.840 0.0000 39.0 6.67% 44.3 2.07%
14 MafK(bZIP)/C2C12-MafK-ChIP-Seq(GSE36030)/Homer GCTGASTCAGCA 1.000000e-09 -20.870 0.0000 97.0 16.58% 186.9 8.76%
15 Ets1-distal(ETS)/CD4+-PolII-ChIP-Seq(Barski_et... MACAGGAAGT 1.000000e-05 -13.680 0.0000 116.0 19.83% 273.6 12.82%
17 Sox3(HMG)/NPC-Sox3-ChIP-Seq(GSE33059)/Homer CCWTTGTY 1.000000e-04 -9.625 0.0016 434.0 74.19% 1425.7 66.82%
18 Nkx2.2(Homeobox)/NPC-Nkx2.2-ChIP-Seq(GSE61673)... BTBRAGTGSN 1.000000e-04 -9.612 0.0016 419.0 71.62% 1367.5 64.09%
19 Bapx1(Homeobox)/VertebralCol-Bapx1-ChIP-Seq(GS... TTRAGTGSYK 1.000000e-04 -9.553 0.0016 467.0 79.83% 1556.7 72.96%
20 RUNX1(Runt)/Jurkat-RUNX1-ChIP-Seq(GSE29180)/Homer AAACCACARM 1.000000e-03 -8.988 0.0026 293.0 50.09% 906.6 42.49%
23 ZNF669(Zf)/HEK293-ZNF669.GFP-ChIP-Seq(GSE58341... GARTGGTCATCGCCC 1.000000e-03 -8.215 0.0050 24.0 4.10% 39.7 1.86%
27 RUNX-AML(Runt)/CD4+-PolII-ChIP-Seq(Barski_et_a... GCTGTGGTTW 1.000000e-03 -7.530 0.0086 212.0 36.24% 637.6 29.89%
29 Sox10(HMG)/SciaticNerve-Sox3-ChIP-Seq(GSE35132... CCWTTGTYYB 1.000000e-03 -7.342 0.0095 407.0 69.57% 1347.5 63.15%
30 Sp1(Zf)/Promoter/Homer GGCCCCGCCCCC 1.000000e-03 -7.220 0.0104 31.0 5.30% 60.7 2.85%
32 E2F3(E2F)/MEF-E2F3-ChIP-Seq(GSE71376)/Homer BTKGGCGGGAAA 1.000000e-03 -7.074 0.0113 77.0 13.16% 195.8 9.18%
37 Sox7(HMG)/ESC-Sox7-ChIP-Seq(GSE133899)/Homer VVRRAACAATGG 1.000000e-02 -6.569 0.0162 102.0 17.44% 278.9 13.07%
40 Brn1(POU,Homeobox)/NPC-Brn1-ChIP-Seq(GSE35496)... TATGCWAATBAV 1.000000e-02 -6.345 0.0188 134.0 22.91% 385.4 18.06%
42 Sox4(HMG)/proB-Sox4-ChIP-Seq(GSE50066)/Homer YCTTTGTTCC 1.000000e-02 -6.213 0.0205 255.0 43.59% 804.0 37.68%
43 ZNF317(Zf)/HEK293-ZNF317.GFP-ChIP-Seq(GSE58341... GTCWGCTGTYYCTCT 1.000000e-02 -6.112 0.0222 29.0 4.96% 59.0 2.77%
45 OCT4-SOX2-TCF-NANOG(POU,Homeobox,HMG)/mES-Oct4... ATTTGCATAACAATG 1.000000e-02 -5.636 0.0341 96.0 16.41% 267.1 12.52%
46 Isl1(Homeobox)/Neuron-Isl1-ChIP-Seq(GSE31456)/... CTAATKGV 1.000000e-02 -5.578 0.0354 467.0 79.83% 1601.5 75.06%
48 Sox2(HMG)/mES-Sox2-ChIP-Seq(GSE11431)/Homer BCCATTGTTC 1.000000e-02 -5.408 0.0402 272.0 46.50% 876.6 41.09%
49 Egr1(Zf)/K562-Egr1-ChIP-Seq(GSE32465)/Homer TGCGTGGGYG 1.000000e-02 -5.339 0.0422 137.0 23.42% 406.2 19.04%
50 Tbx20(T-box)/Heart-Tbx20-ChIP-Seq(GSE29636)/Homer GGTGYTGACAGS 1.000000e-02 -5.310 0.0426 67.0 11.45% 178.0 8.34%
51 Nkx3.1(Homeobox)/LNCaP-Nkx3.1-ChIP-Seq(GSE2826... AAGCACTTAA 1.000000e-02 -5.277 0.0432 479.0 81.88% 1652.0 77.43%
52 E2F6(E2F)/Hela-E2F6-ChIP-Seq(GSE31477)/Homer GGCGGGAARN 1.000000e-02 -5.256 0.0433 64.0 10.94% 168.7 7.91%
53 TATA-Box(TBP)/Promoter/Homer CCTTTTAWAGSC 1.000000e-02 -5.201 0.0449 382.0 65.30% 1282.4 60.10%
55 Klf9(Zf)/GBM-Klf9-ChIP-Seq(GSE62211)/Homer GCCACRCCCACY 1.000000e-02 -5.137 0.0462 85.0 14.53% 236.3 11.08%

Find motifs in female but not in male.

[23]:
cc.tl.compare_motif("Homer/GBM_female", "Homer/GBM_male")
[23]:
Motif Name Consensus P-value Log P-value q-value (Benjamini) # of Target Sequences with Motif(of 1009) % of Target Sequences with Motif # of Background Sequences with Motif(of 2327) % of Background Sequences with Motif
0 Atf4(bZIP)/MEF-Atf4-ChIP-Seq(GSE35681)/Homer MTGATGCAAT 1.000000e-14 -33.810 0.0000 228.0 22.60% 313.9 13.49%
1 Chop(bZIP)/MEF-Chop-ChIP-Seq(GSE35681)/Homer ATTGCATCAT 1.000000e-14 -32.890 0.0000 197.0 19.52% 259.5 11.15%
2 IRF8(IRF)/BMDM-IRF8-ChIP-Seq(GSE77884)/Homer GRAASTGAAAST 1.000000e-12 -28.550 0.0000 255.0 25.27% 381.5 16.40%
5 EBF1(EBF)/Near-E2A-ChIP-Seq(GSE21512)/Homer GTCCCCWGGGGA 1.000000e-11 -27.250 0.0000 310.0 30.72% 495.0 21.27%
8 Zic3(Zf)/mES-Zic3-ChIP-Seq(GSE37889)/Homer GGCCYCCTGCTGDGH 1.000000e-11 -25.440 0.0000 189.0 18.73% 266.5 11.45%
... ... ... ... ... ... ... ... ... ...
122 SPDEF(ETS)/VCaP-SPDEF-ChIP-Seq(SRA014231)/Homer ASWTCCTGBT 1.000000e-02 -4.786 0.0299 420.0 41.63% 882.4 37.92%
123 Srebp2(bHLH)/HepG2-Srebp2-ChIP-Seq(GSE31477)/H... CGGTCACSCCAC 1.000000e-02 -4.760 0.0304 45.0 4.46% 71.8 3.08%
124 ZNF519(Zf)/HEK293-ZNF519.GFP-ChIP-Seq(GSE58341... GAGSCCGAGC 1.000000e-02 -4.720 0.0314 37.0 3.67% 56.1 2.41%
125 AP-2alpha(AP2)/Hela-AP2alpha-ChIP-Seq(GSE31477... ATGCCCTGAGGC 1.000000e-02 -4.695 0.0319 167.0 16.55% 323.9 13.92%
126 Atf7(bZIP)/3T3L1-Atf7-ChIP-Seq(GSE56872)/Homer NGRTGACGTCAY 1.000000e-02 -4.676 0.0323 192.0 19.03% 377.5 16.22%

98 rows × 9 columns

Next, we want to identify differentially bound peaks whose nearby genes are differentially expressed between males and females. To do so, we must first read in the bulk RNA-seq data. Note that this RNA-seq data is normalized by RPKM.

[24]:
rna =  cc.datasets.mouse_brd4_data(data = "RNA")
rna
[24]:
sample.f6_dmso_3 sample.f6_dmso_2 sample.f6_dmso_1 sample.m6_dmso_3 sample.m6_dmso_2 sample.m6_dmso_1
mt-Tf 0.000000 0.639272 0.000000 1.156308 0.000000 0.687531
mt-Rnr1 127.907281 184.396768 172.544723 130.417054 192.566581 114.212201
mt-Tv 0.000000 0.000000 1.761150 0.000000 0.510777 0.000000
mt-Rnr2 247.267306 527.855824 324.345998 675.031187 684.376279 418.345603
mt-Tl1 44.418710 117.660098 70.481211 167.217607 115.129081 127.789076
... ... ... ... ... ... ...
Gm16897 0.281553 0.349020 0.463442 0.078913 0.070742 0.112610
A330023F24Rik 0.118678 0.055749 0.058441 0.029411 0.018833 0.049965
Cd46 0.127930 0.095151 0.075997 0.061467 0.055103 0.087715
Cr1l 9.993119 14.027801 13.717564 13.586101 13.240219 11.089902
Cr2 0.018832 0.056028 0.019578 0.012668 0.017034 0.015064

21430 rows × 6 columns

We now plot the heatmap for calling cards data and gene expression data.

The top lines are the relative Brd4 binding for each of the two groups. The bottom lines are the gene expressions of the top two nearest genes in the 3 RNA-seq replicates for males and females; each gene shown separately.

We can see a relationship between Brd4 binding and the expression of nearby genes. Genes that are strongly bound in females tend to be expressed in females and expressed at lower levels in males and vice versa. The effect is much more muted for the 2nd nearest gene to the peak.

[25]:
cc.pl.heatmap(adata_cc,rna, rnalabels = ["Female1", "Female2", "Female3","Male1", "Male2", "Male3"])
Please make sure that the samples in adata_cc and rna are in the same order.
/home/juanru/Desktop/pycallingcards/pycallingcards/pycallingcards/plotting/_heatmap_ccrna.py:193: RuntimeWarning: invalid value encountered in true_divide
  data[groupnumber:secondnum, :]
/home/juanru/Desktop/pycallingcards/pycallingcards/pycallingcards/plotting/_heatmap_ccrna.py:195: RuntimeWarning: invalid value encountered in true_divide
  data[secondnum:, :]
../../_images/tutorials_notebooks_Brd4_bulk_45_2.png

Find out the co-differential peak-gene pairs. Look into all the differential peaks and then see if the annotated genes are also significantly expressed. We can set the pvalue and score/log foldchange cutoff easily.

[26]:
result = cc.tl.pair_peak_gene_bulk(adata_cc, "https://github.com/The-Mitra-Lab/pycallingcards_data/releases/download/data/deseq_MF.csv",
                                   name_cc = 'logfoldchanges', pvalue_cutoff_cc = 0.001, pvalue_cutoff_rna = 0.001,
                                   lfc_cutoff_rna = 5, lfc_cutoff_cc = 5)
result
[26]:
Peak logfoldchanges_peak Pvalue_peak Pvalue_adj_peak Gene Score_gene Pvalue_gene Pvalue_adj_gene Distance_peak_to_gene
0 chr1_174659622_174662521 5.652435 1.530047e-08 4.586591e-07 Grem2 -10.401831 1.280025e-67 1.252748e-65 171264
1 chr1_174917639_174921559 6.208699 2.711848e-12 1.562878e-10 Grem2 -10.401831 1.280025e-67 1.252748e-65 0
2 chr2_93645657_93648162 5.189615 2.360560e-06 4.324934e-05 Alx4 -10.292504 5.458739e-18 1.081356e-16 0
3 chr3_33140350_33142508 5.534632 6.455627e-08 1.724615e-06 Pex5l -7.590236 2.351122e-24 6.841972e-23 0
4 chr3_126498429_126499612 5.337659 5.594815e-07 1.202787e-05 Arsj -6.281259 8.667831e-27 2.783525e-25 -58056
5 chr3_126657669_126660188 -5.397757 5.546104e-07 1.202787e-05 Arsj -6.281259 8.667831e-27 2.783525e-25 -217296
6 chr3_132085669_132092999 6.121213 1.509244e-105 2.850264e-102 Dkk2 -6.916087 9.389039e-178 5.513378e-175 0
7 chr3_132093815_132106295 5.072335 9.590057e-65 6.924867e-62 Dkk2 -6.916087 9.389039e-178 5.513378e-175 0
8 chr3_141699368_141702608 6.065565 2.403341e-29 5.413249e-27 Bmpr1b -5.570116 6.893076e-40 3.655998e-38 134528
9 chr4_68535893_68538867 5.172571 1.974087e-15 1.648497e-13 Brinp1 -7.758307 5.384595e-10 5.726618e-09 222505
10 chr4_68545353_68551019 6.825275 1.516093e-18 1.684235e-16 Brinp1 -7.758307 5.384595e-10 5.726618e-09 210353
11 chr4_118022206_118024522 -5.332271 1.094775e-06 2.210347e-05 Artn 9.083650 8.043160e-84 1.066497e-81 -92444
12 chr5_13001869_13003460 5.957213 2.037009e-10 8.898685e-09 Sema3a -5.771471 3.017136e-297 6.200968e-294 393324
13 chr5_13276479_13283560 7.727895 3.536779e-34 1.033708e-31 Sema3a -5.771471 3.017136e-297 6.200968e-294 113224
14 chr5_17080302_17082934 5.024623 9.959617e-06 1.561421e-04 Hgf -7.513756 2.543270e-09 2.523623e-08 -460864
15 chr5_28465271_28467276 -5.397757 5.546104e-07 1.202787e-05 Shh 5.565109 2.158809e-156 9.593279e-154 0
16 chr5_38081825_38086331 6.088429 2.350343e-11 1.192216e-09 Nsg1 -5.086635 2.233941e-111 4.963574e-109 50862
17 chr7_56523378_56528436 6.721266 2.699280e-17 2.609056e-15 Gabrg3 -5.960560 2.341610e-05 1.323050e-04 195806
18 chr7_111875691_111878313 5.109476 4.848742e-06 8.221096e-05 Galnt18 -5.959958 2.253335e-68 2.218523e-66 -95715
19 chr8_82329336_82331284 5.189615 2.360560e-06 4.324934e-05 Il15 -5.185933 8.096707e-04 3.267016e-03 340
20 chr8_102781569_102784308 5.761340 3.626346e-09 1.222945e-07 Cdh11 -6.729272 1.932941e-232 1.986339e-229 0
21 chr11_96341058_96343216 5.337659 5.594815e-07 1.202787e-05 Hoxb3 -6.433657 6.669080e-07 4.796719e-06 0
22 chr11_96354206_96355627 5.109476 4.848742e-06 8.221096e-05 Hoxb3 -6.433657 6.669080e-07 4.796719e-06 -6277
23 chr11_96354206_96355627 5.109476 4.848742e-06 8.221096e-05 Hoxb2 -6.351656 3.359549e-06 2.173857e-05 -193
24 chr12_56506841_56514912 -9.259195 2.083740e-95 3.009288e-92 Nkx2-1 9.134857 1.061953e-60 9.189803e-59 17023
25 chr13_117409406_117414817 -5.127606 3.397970e-15 2.771547e-13 Emb -6.021510 3.917980e-06 2.506593e-05 -134992
26 chr14_66104432_66107303 6.455571 1.757505e-14 1.344190e-12 Ephx2 -6.872942 1.597944e-07 1.247549e-06 0
27 chr14_88460573_88463273 5.594736 3.142836e-08 8.899628e-07 Pcdh20 -6.097424 1.179837e-05 6.988068e-05 1474
28 chr14_88898125_88900211 -5.263671 2.162151e-06 4.039800e-05 Pcdh20 -6.097424 1.179837e-05 6.988068e-05 -426730
29 chr15_11692586_11694713 6.046002 4.827833e-11 2.333231e-09 Npr3 -8.356128 2.957230e-242 3.473055e-239 145183
30 chr15_11743634_11745456 6.922288 8.515240e-20 1.034939e-17 Npr3 -8.356128 2.957230e-242 3.473055e-239 94440
31 chr15_11782997_11784930 5.957213 2.037009e-10 8.898685e-09 Npr3 -8.356128 2.957230e-242 3.473055e-239 54966
32 chr18_53465649_53470088 5.761340 3.626346e-09 1.222945e-07 Prdm6 -9.029699 4.482978e-14 6.960257e-13 0
33 chr19_59159949_59161669 -6.094880 4.260481e-11 2.067175e-09 Vax1 7.636277 8.070915e-31 3.071805e-29 2816
34 chr19_59163812_59165938 -5.788022 4.812885e-09 1.603272e-07 Vax1 7.636277 8.070915e-31 3.071805e-29 0
35 chrX_93281556_93288401 5.093695 9.655833e-32 2.469379e-29 Arx -5.380834 6.537245e-123 1.853196e-120 0
36 chrY_1009018_1011799 -5.686862 1.864054e-08 5.507147e-07 Eif2s3y 11.645099 9.414457e-23 2.468780e-21 0
37 chrY_1243715_1246316 -5.520433 1.425488e-07 3.468498e-06 Ddx3y 11.525927 2.291430e-22 5.823136e-21 14399
38 chrY_1243715_1246316 -5.520433 1.425488e-07 3.468498e-06 Uty 8.369064 1.224906e-22 3.191744e-21 0
39 chrY_1282482_1287504 -5.191646 4.272455e-06 7.350598e-05 Ddx3y 11.525927 2.291430e-22 5.823136e-21 0

Next, we visualize, for specific peaks, the magnitude of differential binding and the expression of nearby genes using the dotplot function. This function creates a dotplot with three columns (below). The first shows the relative binding of Brd4 at a chromsomal locus in males and females, the second and third columns show the expression levels of the nearest and second nearest genes in males and females.

[27]:
cc.pl.dotplot_bulk(adata_cc, rna, selected_list = list(np.unique(np.array(result["Peak"]))), num_list = [3,3], dotsize = 500,
                   cmap = 'plasma', figsize = (16, 16), sort_by_chrom = True, topspace = 0.96)
../../_images/tutorials_notebooks_Brd4_bulk_49_0.png

We can see from these results that all of the peaks visualized here display differential binding, and either the nearest, or second nearest gene has differential expression. In most cases, increased Brd4 binding is associated with an increase in gene expression.

Pycallingcards provides functionality to explore relatinoships between differentially bound regions and nearby GWAS identified SNPs. Since the data analyzed here were collected in murine cells, we must first lift over any interesting peaks to the human genome before we can look for nearby disease associated SNPs. We use liftover to get the resuls.

[28]:
result = cc.tl.result_mapping(result)
result
Start mapping the peaks to the new genome.
100%|██████████| 40/40 [00:00<00:00, 69.43it/s]
Start finding location of genes in the new genome.
100%|██████████| 40/40 [00:00<00:00, 287.78it/s]
[28]:
Peak logfoldchanges_peak Pvalue_peak Pvalue_adj_peak Gene Score_gene Pvalue_gene Pvalue_adj_gene Distance_peak_to_gene Chr_liftover Start_liftover End_liftover Chr_hg38 Start_hg38 End_hg38
0 chr1_174659622_174662521 5.652435 1.530047e-08 4.586591e-07 Grem2 -10.401831 1.280025e-67 1.252748e-65 171264 chr1 240280756 240283835 chr1 240489572 240612162
1 chr1_174917639_174921559 6.208699 2.711848e-12 1.562878e-10 Grem2 -10.401831 1.280025e-67 1.252748e-65 0 chr1 240608636 240611870 chr1 240489572 240612162
2 chr2_93645657_93648162 5.189615 2.360560e-06 4.324934e-05 Alx4 -10.292504 5.458739e-18 1.081356e-16 0 chr11 44303696 44306415 chr11 44260727 44310166
3 chr3_33140350_33142508 5.534632 6.455627e-08 1.724615e-06 Pex5l -7.590236 2.351122e-24 6.841972e-23 0 chr3 180033883 180036186 chr3 179794958 180037053
4 chr3_126498429_126499612 5.337659 5.594815e-07 1.202787e-05 Arsj -6.281259 8.667831e-27 2.783525e-25 -58056 chr4 113839066 113844568 chr4 113900283 113979722
5 chr3_126657669_126660188 -5.397757 5.546104e-07 1.202787e-05 Arsj -6.281259 8.667831e-27 2.783525e-25 -217296 chr4 113683844 113686758 chr4 113900283 113979722
6 chr3_132085669_132092999 6.121213 1.509244e-105 2.850264e-102 Dkk2 -6.916087 9.389039e-178 5.513378e-175 0 chr4 107029387 107035946 chr4 106921801 107036296
7 chr3_132093815_132106295 5.072335 9.590057e-65 6.924867e-62 Dkk2 -6.916087 9.389039e-178 5.513378e-175 0 chr4 107015456 107028826 chr4 106921801 107036296
8 chr3_141699368_141702608 6.065565 2.403341e-29 5.413249e-27 Bmpr1b -5.570116 6.893076e-40 3.655998e-38 134528 chr4 95306647 95309431 chr4 94757976 95158450
9 chr4_68535893_68538867 5.172571 1.974087e-15 1.648497e-13 Brinp1 -7.758307 5.384595e-10 5.726618e-09 222505 chr9 118993297 118996641 chr9 119166629 119369461
10 chr4_68545353_68551019 6.825275 1.516093e-18 1.684235e-16 Brinp1 -7.758307 5.384595e-10 5.726618e-09 210353 chr9 119002823 119008509 chr9 119166629 119369461
11 chr4_118022206_118024522 -5.332271 1.094775e-06 2.210347e-05 Artn 9.083650 8.043160e-84 1.066497e-81 -92444 chr1 43824383 43825212 chr1 43933319 43937240
12 chr5_13001869_13003460 5.957213 2.037009e-10 8.898685e-09 Sema3a -5.771471 3.017136e-297 6.200968e-294 393324 chr7 83958342 84194901
13 chr5_13276479_13283560 7.727895 3.536779e-34 1.033708e-31 Sema3a -5.771471 3.017136e-297 6.200968e-294 113224 chr7 84314176 84318280 chr7 83958342 84194901
14 chr5_17080302_17082934 5.024623 9.959617e-06 1.561421e-04 Hgf -7.513756 2.543270e-09 2.523623e-08 -460864 chr7 81335127 81335733 chr7 81699007 81770198
15 chr5_28465271_28467276 -5.397757 5.546104e-07 1.202787e-05 Shh 5.565109 2.158809e-156 9.593279e-154 0 chr7 155810616 155812624 chr7 155799983 155812273
16 chr5_38081825_38086331 6.088429 2.350343e-11 1.192216e-09 Nsg1 -5.086635 2.233941e-111 4.963574e-109 50862 chr4 4488485 4493470 chr4 4386255 4419058
17 chr7_56523378_56528436 6.721266 2.699280e-17 2.609056e-15 Gabrg3 -5.960560 2.341610e-05 1.323050e-04 195806 chr15 26971281 27533227
18 chr7_111875691_111878313 5.109476 4.848742e-06 8.221096e-05 Galnt18 -5.959958 2.253335e-68 2.218523e-66 -95715 chr11 11723801 11725419 chr11 11270873 11622014
19 chr8_82329336_82331284 5.189615 2.360560e-06 4.324934e-05 Il15 -5.185933 8.096707e-04 3.267016e-03 340 chr4 141733679 141736013 chr4 141636595 141733987
20 chr8_102781569_102784308 5.761340 3.626346e-09 1.222945e-07 Cdh11 -6.729272 1.932941e-232 1.986339e-229 0 chr16 65118621 65121296 chr16 64943752 65122137
21 chr11_96341058_96343216 5.337659 5.594815e-07 1.202787e-05 Hoxb3 -6.433657 6.669080e-07 4.796719e-06 0 chr17 48553538 48555622 chr17 48548869 48582622
22 chr11_96354206_96355627 5.109476 4.848742e-06 8.221096e-05 Hoxb3 -6.433657 6.669080e-07 4.796719e-06 -6277 chr17 48540790 48542479 chr17 48548869 48582622
23 chr11_96354206_96355627 5.109476 4.848742e-06 8.221096e-05 Hoxb2 -6.351656 3.359549e-06 2.173857e-05 -193 chr17 48540790 48542479 chr17 48542655 48545031
24 chr12_56506841_56514912 -9.259195 2.083740e-95 3.009288e-92 Nkx2-1 9.134857 1.061953e-60 9.189803e-59 17023 chr14 36491565 36499712 chr14 36516398 36520225
25 chr13_117409406_117414817 -5.127606 3.397970e-15 2.771547e-13 Emb -6.021510 3.917980e-06 2.506593e-05 -134992 chr5 50396196 50441400
26 chr14_66104432_66107303 6.455571 1.757505e-14 1.344190e-12 Ephx2 -6.872942 1.597944e-07 1.247549e-06 0 chr8 27506903 27511135 chr8 27491001 27544922
27 chr14_88460573_88463273 5.594736 3.142836e-08 8.899628e-07 Pcdh20 -6.097424 1.179837e-05 6.988068e-05 1474 chr13 61406547 61407676 chr13 61409685 61415522
28 chr14_88898125_88900211 -5.263671 2.162151e-06 4.039800e-05 Pcdh20 -6.097424 1.179837e-05 6.988068e-05 -426730 chr13 61862674 61862983 chr13 61409685 61415522
29 chr15_11692586_11694713 6.046002 4.827833e-11 2.333231e-09 Npr3 -8.356128 2.957230e-242 3.473055e-239 145183 chr5 33023231 33024567 chr5 32710636 32791724
30 chr15_11743634_11745456 6.922288 8.515240e-20 1.034939e-17 Npr3 -8.356128 2.957230e-242 3.473055e-239 94440 chr5 32929090 32932829 chr5 32710636 32791724
31 chr15_11782997_11784930 5.957213 2.037009e-10 8.898685e-09 Npr3 -8.356128 2.957230e-242 3.473055e-239 54966 chr5 32864592 32866245 chr5 32710636 32791724
32 chr18_53465649_53470088 5.761340 3.626346e-09 1.222945e-07 Prdm6 -9.029699 4.482978e-14 6.960257e-13 0 chr5 123091173 123096087 chr5 123089102 123194266
33 chr19_59159949_59161669 -6.094880 4.260481e-11 2.067175e-09 Vax1 7.636277 8.070915e-31 3.071805e-29 2816 chr10 117127894 117129270 chr10 117128520 117138301
34 chr19_59163812_59165938 -5.788022 4.812885e-09 1.603272e-07 Vax1 7.636277 8.070915e-31 3.071805e-29 0 chr10 117131583 117133776 chr10 117128520 117138301
35 chrX_93281556_93288401 5.093695 9.655833e-32 2.469379e-29 Arx -5.380834 6.537245e-123 1.853196e-120 0 chrX 25013967 25020980 chrX 25003695 25015948
36 chrY_1009018_1011799 -5.686862 1.864054e-08 5.507147e-07 Eif2s3y 11.645099 9.414457e-23 2.468780e-21 0 chrX 24054686 24057370
37 chrY_1243715_1246316 -5.520433 1.425488e-07 3.468498e-06 Ddx3y 11.525927 2.291430e-22 5.823136e-21 14399 chrY 13478375 13480067 chrY 12903998 12920478
38 chrY_1243715_1246316 -5.520433 1.425488e-07 3.468498e-06 Uty 8.369064 1.224906e-22 3.191744e-21 0 chrY 13478375 13480067 chrY 13248378 13480670
39 chrY_1282482_1287504 -5.191646 4.272455e-06 7.350598e-05 Ddx3y 11.525927 2.291430e-22 5.823136e-21 0 chrY 12903183 12909538 chrY 12903998 12920478
[29]:
GWAS_result = cc.tl.GWAS(result)
GWAS_result
[29]:
Peak logfoldchanges_peak Pvalue_peak Pvalue_adj_peak Gene Score_gene Pvalue_gene Pvalue_adj_gene Distance_peak_to_gene Chr_liftover Start_liftover End_liftover Chr_hg38 Start_hg38 End_hg38 GWAS
0 chr1_174659622_174662521 5.652435 1.530047e-08 4.586591e-07 Grem2 -10.401831 1.280025e-67 1.252748e-65 171264 chr1 240280756 240283835 chr1 240489572 240612162 Coronary heart disease
1 chr1_174917639_174921559 6.208699 2.711848e-12 1.562878e-10 Grem2 -10.401831 1.280025e-67 1.252748e-65 0 chr1 240608636 240611870 chr1 240489572 240612162
2 chr2_93645657_93648162 5.189615 2.360560e-06 4.324934e-05 Alx4 -10.292504 5.458739e-18 1.081356e-16 0 chr11 44303696 44306415 chr11 44260727 44310166
3 chr3_33140350_33142508 5.534632 6.455627e-08 1.724615e-06 Pex5l -7.590236 2.351122e-24 6.841972e-23 0 chr3 180033883 180036186 chr3 179794958 180037053
4 chr3_126498429_126499612 5.337659 5.594815e-07 1.202787e-05 Arsj -6.281259 8.667831e-27 2.783525e-25 -58056 chr4 113839066 113844568 chr4 113900283 113979722
5 chr3_126657669_126660188 -5.397757 5.546104e-07 1.202787e-05 Arsj -6.281259 8.667831e-27 2.783525e-25 -217296 chr4 113683844 113686758 chr4 113900283 113979722
6 chr3_132085669_132092999 6.121213 1.509244e-105 2.850264e-102 Dkk2 -6.916087 9.389039e-178 5.513378e-175 0 chr4 107029387 107035946 chr4 106921801 107036296
7 chr3_132093815_132106295 5.072335 9.590057e-65 6.924867e-62 Dkk2 -6.916087 9.389039e-178 5.513378e-175 0 chr4 107015456 107028826 chr4 106921801 107036296
8 chr3_141699368_141702608 6.065565 2.403341e-29 5.413249e-27 Bmpr1b -5.570116 6.893076e-40 3.655998e-38 134528 chr4 95306647 95309431 chr4 94757976 95158450
9 chr4_68535893_68538867 5.172571 1.974087e-15 1.648497e-13 Brinp1 -7.758307 5.384595e-10 5.726618e-09 222505 chr9 118993297 118996641 chr9 119166629 119369461 Gut microbiota (bacterial taxa, hurdle binary ...
10 chr4_68545353_68551019 6.825275 1.516093e-18 1.684235e-16 Brinp1 -7.758307 5.384595e-10 5.726618e-09 210353 chr9 119002823 119008509 chr9 119166629 119369461
11 chr4_118022206_118024522 -5.332271 1.094775e-06 2.210347e-05 Artn 9.083650 8.043160e-84 1.066497e-81 -92444 chr1 43824383 43825212 chr1 43933319 43937240
12 chr5_13001869_13003460 5.957213 2.037009e-10 8.898685e-09 Sema3a -5.771471 3.017136e-297 6.200968e-294 393324 chr7 83958342 84194901
13 chr5_13276479_13283560 7.727895 3.536779e-34 1.033708e-31 Sema3a -5.771471 3.017136e-297 6.200968e-294 113224 chr7 84314176 84318280 chr7 83958342 84194901
14 chr5_17080302_17082934 5.024623 9.959617e-06 1.561421e-04 Hgf -7.513756 2.543270e-09 2.523623e-08 -460864 chr7 81335127 81335733 chr7 81699007 81770198
15 chr5_28465271_28467276 -5.397757 5.546104e-07 1.202787e-05 Shh 5.565109 2.158809e-156 9.593279e-154 0 chr7 155810616 155812624 chr7 155799983 155812273 Urate levels
16 chr5_38081825_38086331 6.088429 2.350343e-11 1.192216e-09 Nsg1 -5.086635 2.233941e-111 4.963574e-109 50862 chr4 4488485 4493470 chr4 4386255 4419058
17 chr7_56523378_56528436 6.721266 2.699280e-17 2.609056e-15 Gabrg3 -5.960560 2.341610e-05 1.323050e-04 195806 chr15 26971281 27533227
18 chr7_111875691_111878313 5.109476 4.848742e-06 8.221096e-05 Galnt18 -5.959958 2.253335e-68 2.218523e-66 -95715 chr11 11723801 11725419 chr11 11270873 11622014
19 chr8_82329336_82331284 5.189615 2.360560e-06 4.324934e-05 Il15 -5.185933 8.096707e-04 3.267016e-03 340 chr4 141733679 141736013 chr4 141636595 141733987
20 chr8_102781569_102784308 5.761340 3.626346e-09 1.222945e-07 Cdh11 -6.729272 1.932941e-232 1.986339e-229 0 chr16 65118621 65121296 chr16 64943752 65122137
21 chr11_96341058_96343216 5.337659 5.594815e-07 1.202787e-05 Hoxb3 -6.433657 6.669080e-07 4.796719e-06 0 chr17 48553538 48555622 chr17 48548869 48582622 Migraine; Insomnia
22 chr11_96354206_96355627 5.109476 4.848742e-06 8.221096e-05 Hoxb3 -6.433657 6.669080e-07 4.796719e-06 -6277 chr17 48540790 48542479 chr17 48548869 48582622 Cystatin C levels
23 chr11_96354206_96355627 5.109476 4.848742e-06 8.221096e-05 Hoxb2 -6.351656 3.359549e-06 2.173857e-05 -193 chr17 48540790 48542479 chr17 48542655 48545031 Cystatin C levels
24 chr12_56506841_56514912 -9.259195 2.083740e-95 3.009288e-92 Nkx2-1 9.134857 1.061953e-60 9.189803e-59 17023 chr14 36491565 36499712 chr14 36516398 36520225 Nonsyndromic orofacial cleft x sex interaction...
25 chr13_117409406_117414817 -5.127606 3.397970e-15 2.771547e-13 Emb -6.021510 3.917980e-06 2.506593e-05 -134992 chr5 50396196 50441400
26 chr14_66104432_66107303 6.455571 1.757505e-14 1.344190e-12 Ephx2 -6.872942 1.597944e-07 1.247549e-06 0 chr8 27506903 27511135 chr8 27491001 27544922 Childhood ear infection
27 chr14_88460573_88463273 5.594736 3.142836e-08 8.899628e-07 Pcdh20 -6.097424 1.179837e-05 6.988068e-05 1474 chr13 61406547 61407676 chr13 61409685 61415522
28 chr14_88898125_88900211 -5.263671 2.162151e-06 4.039800e-05 Pcdh20 -6.097424 1.179837e-05 6.988068e-05 -426730 chr13 61862674 61862983 chr13 61409685 61415522
29 chr15_11692586_11694713 6.046002 4.827833e-11 2.333231e-09 Npr3 -8.356128 2.957230e-242 3.473055e-239 145183 chr5 33023231 33024567 chr5 32710636 32791724
30 chr15_11743634_11745456 6.922288 8.515240e-20 1.034939e-17 Npr3 -8.356128 2.957230e-242 3.473055e-239 94440 chr5 32929090 32932829 chr5 32710636 32791724
31 chr15_11782997_11784930 5.957213 2.037009e-10 8.898685e-09 Npr3 -8.356128 2.957230e-242 3.473055e-239 54966 chr5 32864592 32866245 chr5 32710636 32791724
32 chr18_53465649_53470088 5.761340 3.626346e-09 1.222945e-07 Prdm6 -9.029699 4.482978e-14 6.960257e-13 0 chr5 123091173 123096087 chr5 123089102 123194266
33 chr19_59159949_59161669 -6.094880 4.260481e-11 2.067175e-09 Vax1 7.636277 8.070915e-31 3.071805e-29 2816 chr10 117127894 117129270 chr10 117128520 117138301
34 chr19_59163812_59165938 -5.788022 4.812885e-09 1.603272e-07 Vax1 7.636277 8.070915e-31 3.071805e-29 0 chr10 117131583 117133776 chr10 117128520 117138301 Age at first birth; Nonsyndromic cleft lip wit...
35 chrX_93281556_93288401 5.093695 9.655833e-32 2.469379e-29 Arx -5.380834 6.537245e-123 1.853196e-120 0 chrX 25013967 25020980 chrX 25003695 25015948
36 chrY_1009018_1011799 -5.686862 1.864054e-08 5.507147e-07 Eif2s3y 11.645099 9.414457e-23 2.468780e-21 0 chrX 24054686 24057370
37 chrY_1243715_1246316 -5.520433 1.425488e-07 3.468498e-06 Ddx3y 11.525927 2.291430e-22 5.823136e-21 14399 chrY 13478375 13480067 chrY 12903998 12920478
38 chrY_1243715_1246316 -5.520433 1.425488e-07 3.468498e-06 Uty 8.369064 1.224906e-22 3.191744e-21 0 chrY 13478375 13480067 chrY 13248378 13480670
39 chrY_1282482_1287504 -5.191646 4.272455e-06 7.350598e-05 Ddx3y 11.525927 2.291430e-22 5.823136e-21 0 chrY 12903183 12909538 chrY 12903998 12920478

Save the file if needed.

[30]:
adata_cc.write("Brd4_bindings_bulk.h5ad")