Tutorial: SP1 bindings in Cre-driver mouse lines.¶

In this tutorial, we will analyze the binding of the transcription factor Sp1, collected using cre-dependant calling cards from a Syn1::Cre-driver mouse line. Bulk unfused (Brd4 directed) data was also collected as backgound. This dataset contains two time points: day 10(P10) and day 28(P28). The dataset is from Cammack et al., PNAS. (2020), and it can be downloaded from GEO.

In this tutorial, we will call peaks, make annotation, and perfrom differential peak analysis. There are 271946 insertions in the SP1 P10 qbed file, 1083099 insertions in the SP1 P28 qbed file, and 5573110 insertions in the brd4 qbed file.

[1]:

import pycallingcards as cc
import numpy as np
import pandas as pd
import scanpy as sc
from matplotlib import pyplot as plt
plt.rcParams['figure.dpi'] = 150

We start by reading the qbed datafile. In this file, each row represents a Sp1-directed insertion and columns indicate the chromosome, start point and end point, read number, the direction and the sample barcode of each insertion. For example, the first row means one insertion is on Chromosome 1, and starts from 3095378 and ends on 3095382. The reads number is 7 with direction going from 3’ to 5’. The barcode of the cell is TAAGG. We give it the group column to distinguish between groups.

Use cc.rd.read_qbed(filename) to read your own qbed data.

[2]:

SP1_P10 = cc.datasets.SP1_Cre_data(data = "SP1_P10")
SP1_P10['group'] = 'P10'
SP1_P10

[2]:

	Chr	Start	End	Reads	Direction	Barcodes	group
0	chr1	3095378	3095382	7	+	TAAGG	P10
1	chr1	3120128	3120132	1	+	GTTAC	P10
2	chr1	3121275	3121279	10	-	GTTAC	P10
3	chr1	3121275	3121279	2	-	GTTAC	P10
4	chr1	3222947	3222951	1	-	GTTAC	P10
...	...	...	...	...	...	...	...
271941	chrY	1010004	1010008	1	-	GTTAC	P10
271942	chrY	1011155	1011159	12	-	GTTAC	P10
271943	chrY	1178766	1178770	10	+	GTTAC	P10
271944	chrY	1244787	1244791	11	+	GTTAC	P10
271945	chrY	5433055	5433059	2	+	CGAAA	P10

271946 rows × 7 columns

[3]:

SP1_P28 = cc.datasets.SP1_Cre_data(data = "SP1_P28")
SP1_P28['group'] = 'P28'
SP1_P28

[3]:

	Chr	Start	End	Reads	Direction	Barcodes	group
0	chr1	3071865	3071869	76	+	GTCAT	P28
1	chr1	3095378	3095382	7	+	ACTGC	P28
2	chr1	3102707	3102711	1	-	GTCAT	P28
3	chr1	3119905	3119909	4	+	GTCAT	P28
4	chr1	3120189	3120193	66	-	GTCAT	P28
...	...	...	...	...	...	...	...
1083094	chrY	90803579	90803583	14	-	GTCAT	P28
1083095	chrY	90805130	90805134	10	+	ACTGC	P28
1083096	chrY	90805130	90805134	1	+	CGAAA	P28
1083097	chrY	90806531	90806535	5	-	GTCAT	P28
1083098	chrY	90811001	90811005	63	+	ACTGC	P28

1083099 rows × 7 columns

To call differential peaks, we first combine the two qbed files together.

[4]:

SP1 = cc.rd.combine_qbed([SP1_P10, SP1_P28])
SP1

[4]:

	Chr	Start	End	Reads	Direction	Barcodes	group
0	chr1	3071865	3071869	76	+	GTCAT	P28
1	chr1	3095378	3095382	7	+	TAAGG	P10
2	chr1	3095378	3095382	7	+	ACTGC	P28
3	chr1	3102707	3102711	1	-	GTCAT	P28
4	chr1	3119905	3119909	4	+	GTCAT	P28
...	...	...	...	...	...	...	...
1355040	chrY	90803579	90803583	14	-	GTCAT	P28
1355041	chrY	90805130	90805134	10	+	ACTGC	P28
1355042	chrY	90805130	90805134	1	+	CGAAA	P28
1355043	chrY	90806531	90806535	5	-	GTCAT	P28
1355044	chrY	90811001	90811005	63	+	ACTGC	P28

1355045 rows × 7 columns

The insertions might be mapped unlocalized or unplaced, we now clean it here.

[5]:

SP1 = cc.pp.clean_qbed(SP1)
SP1

[5]:

	Chr	Start	End	Reads	Direction	Barcodes	group
0	chr1	3071865	3071869	76	+	GTCAT	P28
1	chr1	3095378	3095382	7	+	TAAGG	P10
2	chr1	3095378	3095382	7	+	ACTGC	P28
3	chr1	3102707	3102711	1	-	GTCAT	P28
4	chr1	3119905	3119909	4	+	GTCAT	P28
...	...	...	...	...	...	...	...
1355040	chrY	90803579	90803583	14	-	GTCAT	P28
1355041	chrY	90805130	90805134	10	+	ACTGC	P28
1355042	chrY	90805130	90805134	1	+	CGAAA	P28
1355043	chrY	90806531	90806535	5	-	GTCAT	P28
1355044	chrY	90811001	90811005	63	+	ACTGC	P28

1354844 rows × 7 columns

We then read the brd4 background file.

[6]:

bg = cc.datasets.SP1_Cre_data(data = "background")
bg

[6]:

	Chr	Start	End	Reads	Direction	Barcodes
0	chr1	3004272	3004276	5	+	ACTGC
1	chr1	3028063	3028067	6	-	ACTGC
2	chr1	3043241	3043245	1	-	ACTGC
3	chr1	3049117	3049121	1	-	CAGTG
4	chr1	3052152	3052156	1	+	ACTGC
...	...	...	...	...	...	...
5573105	chrY	90811001	90811005	2	+	CAGTG
5573106	chrY	90811001	90811005	1	+	CAGTG
5573107	chrY	90811001	90811005	1	+	CAGTG
5573108	chrY	90811001	90811005	2	+	TGACA
5573109	chrY	90811001	90811005	13	+	CAGTG

5573110 rows × 6 columns

[7]:

bg = cc.pp.clean_qbed(bg)
bg

[7]:

	Chr	Start	End	Reads	Direction	Barcodes
0	chr1	3004272	3004276	5	+	ACTGC
1	chr1	3028063	3028067	6	-	ACTGC
2	chr1	3043241	3043245	1	-	ACTGC
3	chr1	3049117	3049121	1	-	CAGTG
4	chr1	3052152	3052156	1	+	ACTGC
...	...	...	...	...	...	...
5573105	chrY	90811001	90811005	2	+	CAGTG
5573106	chrY	90811001	90811005	1	+	CAGTG
5573107	chrY	90811001	90811005	1	+	CAGTG
5573108	chrY	90811001	90811005	2	+	TGACA
5573109	chrY	90811001	90811005	13	+	CAGTG

5572856 rows × 6 columns

We next need to call peaks to identify potential Sp1 binding sites. Three different methods (‘CCcaller’, ‘MACCs’, ‘Blockify’) are available along with three different species (‘hg38’, ‘mm10’, ‘sacCer3’).

Here, we use MACCs in mouse(‘mm10’) data. window_size is the most important parameter for MACCs, and it is highly related to the length of a peak. 1000-1200 is a good fit for window_size. step_size is another important parameter and it controls how careful we are looking into the chromosomes. 500-800 is good for step_size. pvalue_cutoffTTAA is the pvalue cutoff for TTAA data and pvalue_cutoffbg is pvalue cutoff for the background qbed data. Normally, the setting for pvalue_cutoffbg is considerably higher than pvalue_cutoffTTAA. pvalue_cutoffbg is recommended from 0.00001 to 0.01 and pvalue_cutoffTTAA is recommended from 0.001 to 0.1. The setting of pseudocounts is largely influenced by library size. For the first time of trial, it can be adjusted to \(10^{6}-10^{-5} \times\) the number of insertions.

[8]:

peak_data = cc.pp.call_peaks(SP1, bg, method = "MACCs", reference = "mm10", pvalue_cutoffbg = 0.001,
                             window_size = 1000, step_size = 500, pvalue_cutoffTTAA = 0.00001,
                             lam_win_size = 1000000, pseudocounts = 0.1, record = True, save = "peak.bed")
peak_data

For the MACCs method with background, [expdata, background, reference, pvalue_cutoffbg, pvalue_cutoffTTAA, lam_win_size, window_size, step_size, extend, pseudocounts, test_method, min_insertions, record] would be utilized.

100%|██████████| 21/21 [04:43<00:00, 13.50s/it]

[8]:

	Chr	Start	End	Center	Experiment Insertions	Background insertions	Reference Insertions	pvalue Reference	pvalue Background	Fraction Experiment	TPH Experiment	Fraction background	TPH background	TPH background subtracted	pvalue_adj Reference
0	chr1	3399656	3400345	3400068.0	21	69	8	0.000000e+00	1.745466e-04	0.000015	1549.993948	1.238144e-05	1238.144320	311.849628	0.000000e+00
1	chr1	3672013	3673193	3672213.0	61	47	9	0.000000e+00	0.000000e+00	0.000045	4502.363372	8.433737e-06	843.373667	3658.989705	0.000000e+00
2	chr1	4773450	4774236	4773657.0	6	5	4	2.405425e-08	1.939280e-04	0.000004	442.855414	8.972060e-07	89.720603	353.134811	6.323087e-06
3	chr1	4785206	4786550	4785472.0	31	47	13	0.000000e+00	2.298502e-08	0.000023	2288.086304	8.433737e-06	843.373667	1444.712637	0.000000e+00
4	chr1	5016489	5017564	5017023.0	8	10	8	1.672281e-09	3.938477e-04	0.000006	590.473885	1.794412e-06	179.441206	411.032679	4.739980e-07
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
10739	chrX	169325223	169325838	169325423.0	11	16	9	5.107026e-15	1.876575e-04	0.000008	811.901592	2.871059e-06	287.105929	524.795662	1.893816e-12
10740	chrX	169799382	169801491	169799905.0	25	29	23	0.000000e+00	3.271381e-07	0.000018	1845.230890	5.203795e-06	520.379497	1324.851393	0.000000e+00
10741	chrX	169829316	169830343	169829658.0	9	9	10	2.746225e-09	1.659074e-04	0.000007	664.283120	1.614971e-06	161.497085	502.786035	7.671705e-07
10742	chrX	169878786	169879561	169879328.0	19	21	10	0.000000e+00	1.379092e-06	0.000014	1402.375476	3.768265e-06	376.826532	1025.548944	0.000000e+00
10743	chrY	1009804	1011442	1010850.0	17	2	16	0.000000e+00	7.771561e-16	0.000013	1254.757005	3.588824e-07	35.888241	1218.868764	0.000000e+00

10744 rows × 15 columns

Approach the above by first combining the data and then call peaks together.

Although not recommended, you could also try calling peaks seperately and then merging the peaks by pybedtools. Below is the code:

import pybedtools
peak_data1 = cc.pp.call_peaks(SP1_P10, bg, method = "MACCs", reference = "mm10",
                              pvalue_cutoffbg = 0.1, window_size=1500, step_size=500,
                              lam_win_size = None,  pseudocounts = 0.1, record = True)
peak_data2 = cc.pp.call_peaks(SP1_P28, bg, method = "MACCs", reference = "mm10",
                              pvalue_cutoffbg = 0.1, window_size=1500, step_size=500,
                              lam_win_size = None,  pseudocounts = 0.1, record = True)
peak = cc.rd.combine_qbed([peak_data1, peak_data2])
peak = pybedtools.BedTool.from_dataframe(peak).merge().to_dataframe()
peak_data = peak.rename(columns={"chrom":"Chr", "start":"Start", "end":"End"})

In order to choose the suitable method and parameters for peak calling, please take a look at genome areas. We stongly advise to adjust the parameters for cc.pp.call_peaks() to call better peaks.

In this plot, the colored ones are the experiment qbed data and the gray ones are the background data. The top section shows insertions and their read counts. One dot is an insertion and the height is log(reads+1). The middle section indicates the distribution of insertions. The bottom section represents reference genes and peaks.

[9]:

cc.pl.draw_area("chr1", 4856929, 4863861, 100000, peak_data, SP1, "mm10", bg, font_size=2, plotsize = [1,1,6],
                figsize = (30,12), peak_line = 2, save = False, bins = 500, example_length = 5000)

../../_images/tutorials_notebooks_SP1_bulk_17_0.png

[10]:

cc.pl.draw_area("chr2", 3116481, 3125943, 15000, peak_data, SP1, "mm10", bg, font_size=2, plotsize = [1,1,3],
                figsize = (30,10), peak_line = 3, save = False, bins = 500, example_length = 1000)

../../_images/tutorials_notebooks_SP1_bulk_18_0.png

We can also visualize our data directly in the WashU Epigenome Browser. This can be useful for overlaying your data with other published datasets. Please note that this link only valid for 24hrs, so you will have to rerun it if you want to use it after this time period.

[11]:

qbed= {"SP1":SP1, "bg":bg}
bed = {'PEAK':peak_data}
cc.pl.WashU_browser_url(qbed, bed, genome = 'mm10')

All qbed addressed
All bed addressed
Uploading files
Please click the following link to see the data on WashU Epigenome Browser directly.
https://epigenomegateway.wustl.edu/browser/?genome=mm10&hub=https://companion.epigenomegateway.org//task/a4f06270936d67281d4c9aa3065922e2/output//datahub.json

Pycallingcards can be used to visual peak locations acorss the genome to see that the distribution of peaks is unbiased and that all chromosomes are represented.

[12]:

cc.pl.whole_peaks(peak_data, linewidth = 1, reference = "mm10")

../../_images/tutorials_notebooks_SP1_bulk_22_0.png

We can next use HOMER to look for motifs that are overrepresented under the peaks. Here, we expect to find the SP1 motif.

[13]:

cc.tl.call_motif(peaks_frame = peak_data, reference = "mm10", save_homer = "Homer/peak",
                 homer_path = "/home/juanru/miniconda3/bin/", num_cores=12)

There is no save_name, it will save to temp_Homer_trial.bed and then delete.


        Position file = temp_Homer_trial.bed
        Genome = mm10
        Output Directory = Homer/peak
        Fragment size set to 1000
        Using 12 CPUs
        Will not run homer for de novo motifs
        Found mset for "mouse", will check against vertebrates motifs
        Peak/BED file conversion summary:
                BED/Header formatted lines: 10744
                peakfile formatted lines: 0

        Peak File Statistics:
                Total Peaks: 10744
                Redundant Peak IDs: 0
                Peaks lacking information: 0 (need at least 5 columns per peak)
                Peaks with misformatted coordinates: 0 (should be integer)
                Peaks with misformatted strand: 0 (should be either +/- or 0/1)

        Peak file looks good!

        Background files for 1000 bp fragments found.
        Custom genome sequence directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//

        Extracting sequences from directory: /home/juanru/miniconda3/share/homer/.//data/genomes/mm10//
        Extracting 680 sequences from chr1
        Extracting 873 sequences from chr2
        Extracting 561 sequences from chr3
        Extracting 698 sequences from chr4
        Extracting 682 sequences from chr5
        Extracting 540 sequences from chr6
        Extracting 670 sequences from chr7
        Extracting 535 sequences from chr8
        Extracting 609 sequences from chr9
        Extracting 557 sequences from chr10
        Extracting 768 sequences from chr11
        Extracting 469 sequences from chr12
        Extracting 427 sequences from chr13
        Extracting 413 sequences from chr14
        Extracting 422 sequences from chr15
        Extracting 396 sequences from chr16
        Extracting 453 sequences from chr17
        Extracting 366 sequences from chr18
        Extracting 325 sequences from chr19
        Extracting 299 sequences from chrX
        Extracting 1 sequences from chrY
                1 of 10744 removed because they had >70.00% Ns (i.e. masked repeats)

        Not removing redundant sequences


        Sequences processed:
                Auto detected maximum sequence length of 1001 bp
                10743 total

        Frequency Bins: 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.6 0.7 0.8
        Freq    Bin     Count
        0.3     2       6
        0.35    3       180
        0.4     4       932
        0.45    5       2120
        0.5     6       2862
        0.6     7       3871
        0.7     8       717
        0.8     9       55

        Total sequences set to 50000

        Choosing background that matches in CpG/GC content...
        Bin     # Targets       # Background    Background Weight
        2       6       22      0.997
        3       180     658     1.000
        4       932     3406    1.000
        5       2120    7747    1.000
        6       2862    10458   1.000
        7       3871    14145   1.000
        8       717     2620    1.000
        9       55      201     1.000
        Assembling sequence file...
        Normalizing lower order oligos using homer2

        Reading input files...
        50000 total sequences read
        Autonormalization: 1-mers (4 total)
                A       25.47%  25.72%  0.990
                C       24.53%  24.28%  1.010
                G       24.53%  24.28%  1.010
                T       25.47%  25.72%  0.990
        Autonormalization: 2-mers (16 total)
                AA      7.69%   7.08%   1.087
                CA      6.98%   7.64%   0.914
                GA      6.16%   6.34%   0.972
                TA      4.64%   4.66%   0.996
                AC      5.10%   5.36%   0.951
                CC      7.22%   7.02%   1.029
                GC      6.05%   5.57%   1.087
                TC      6.16%   6.34%   0.972
                AG      7.55%   7.89%   0.957
                CG      2.78%   1.74%   1.598
                GG      7.22%   7.02%   1.029
                TG      6.98%   7.64%   0.914
                AT      5.13%   5.38%   0.953
                CT      7.55%   7.89%   0.957
                GT      5.10%   5.36%   0.951
                TT      7.69%   7.08%   1.087
        Autonormalization: 3-mers (64 total)
        Normalization weights can be found in file: Homer/peak/seq.autonorm.tsv
        Converging on autonormalization solution:
        ...............................................................................
        Final normalization:    Autonormalization: 1-mers (4 total)
                A       25.47%  25.41%  1.002
                C       24.53%  24.59%  0.998
                G       24.53%  24.59%  0.998
                T       25.47%  25.41%  1.002
        Autonormalization: 2-mers (16 total)
                AA      7.69%   7.40%   1.040
                CA      6.98%   7.14%   0.977
                GA      6.16%   6.21%   0.992
                TA      4.64%   4.67%   0.995
                AC      5.10%   5.16%   0.989
                CC      7.22%   7.21%   1.001
                GC      6.05%   6.01%   1.007
                TC      6.16%   6.21%   0.992
                AG      7.55%   7.58%   0.996
                CG      2.78%   2.65%   1.049
                GG      7.22%   7.21%   1.001
                TG      6.98%   7.14%   0.977
                AT      5.13%   5.28%   0.972
                CT      7.55%   7.58%   0.996
                GT      5.10%   5.16%   0.989
                TT      7.69%   7.40%   1.040
        Autonormalization: 3-mers (64 total)
        Finished preparing sequence/group files

        ----------------------------------------------------------
        Known motif enrichment

        Reading input files...
        50000 total sequences read
        440 motifs loaded
        Cache length = 11180
        Using binomial scoring
        Checking enrichment of 440 motif(s)
        |0%                                    50%                                  100%|
        =================================================================================

Finished!


        Preparing HTML output with sequence logos...
                1 of 440 (1e-529) Sp5(Zf)/mES-Sp5.Flag-ChIP-Seq(GSE72989)/Homer
                2 of 440 (1e-371) Sp2(Zf)/HEK293-Sp2.eGFP-ChIP-Seq(Encode)/Homer
                3 of 440 (1e-365) KLF14(Zf)/HEK293-KLF14.GFP-ChIP-Seq(GSE58341)/Homer
                4 of 440 (1e-354) KLF1(Zf)/HUDEP2-KLF1-CutnRun(GSE136251)/Homer
                5 of 440 (1e-294) Sp1(Zf)/Promoter/Homer
                6 of 440 (1e-258) KLF3(Zf)/MEF-Klf3-ChIP-Seq(GSE44748)/Homer
                7 of 440 (1e-251) Maz(Zf)/HepG2-Maz-ChIP-Seq(GSE31477)/Homer
                8 of 440 (1e-229) KLF5(Zf)/LoVo-KLF5-ChIP-Seq(GSE49402)/Homer
                9 of 440 (1e-221) KLF6(Zf)/PDAC-KLF6-ChIP-Seq(GSE64557)/Homer
                10 of 440 (1e-154) Klf9(Zf)/GBM-Klf9-ChIP-Seq(GSE62211)/Homer
                11 of 440 (1e-110) Klf4(Zf)/mES-Klf4-ChIP-Seq(GSE11431)/Homer
                12 of 440 (1e-94) TATA-Box(TBP)/Promoter/Homer
                13 of 440 (1e-74) KLF10(Zf)/HEK293-KLF10.GFP-ChIP-Seq(GSE58341)/Homer
                14 of 440 (1e-63) Egr1(Zf)/K562-Egr1-ChIP-Seq(GSE32465)/Homer
                15 of 440 (1e-58) E2F3(E2F)/MEF-E2F3-ChIP-Seq(GSE71376)/Homer
                16 of 440 (1e-53) Zfp281(Zf)/ES-Zfp281-ChIP-Seq(GSE81042)/Homer
                17 of 440 (1e-52) WT1(Zf)/Kidney-WT1-ChIP-Seq(GSE90016)/Homer
                18 of 440 (1e-50) Lhx3(Homeobox)/Neuron-Lhx3-ChIP-Seq(GSE31456)/Homer
                19 of 440 (1e-47) Nkx6.1(Homeobox)/Islet-Nkx6.1-ChIP-Seq(GSE40975)/Homer
                20 of 440 (1e-47) ZNF467(Zf)/HEK293-ZNF467.GFP-ChIP-Seq(GSE58341)/Homer
                21 of 440 (1e-45) RFX(HTH)/K562-RFX3-ChIP-Seq(SRA012198)/Homer
                22 of 440 (1e-43) Znf263(Zf)/K562-Znf263-ChIP-Seq(GSE31477)/Homer
                23 of 440 (1e-42) Hoxa13(Homeobox)/ChickenMSG-Hoxa13.Flag-ChIP-Seq(GSE86088)/Homer
                24 of 440 (1e-40) Lhx1(Homeobox)/EmbryoCarcinoma-Lhx1-ChIP-Seq(GSE70957)/Homer
                25 of 440 (1e-39) Mef2c(MADS)/GM12878-Mef2c-ChIP-Seq(GSE32465)/Homer
                26 of 440 (1e-38) CHR(?)/Hela-CellCycle-Expression/Homer
                27 of 440 (1e-37) Rfx2(HTH)/LoVo-RFX2-ChIP-Seq(GSE49402)/Homer
                28 of 440 (1e-37) Isl1(Homeobox)/Neuron-Isl1-ChIP-Seq(GSE31456)/Homer
                29 of 440 (1e-37) Dlx3(Homeobox)/Kerainocytes-Dlx3-ChIP-Seq(GSE89884)/Homer
                30 of 440 (1e-36) En1(Homeobox)/SUM149-EN1-ChIP-Seq(GSE120957)/Homer
                31 of 440 (1e-36) X-box(HTH)/NPC-H3K4me1-ChIP-Seq(GSE16256)/Homer
                32 of 440 (1e-35) Mef2d(MADS)/Retina-Mef2d-ChIP-Seq(GSE61391)/Homer
                33 of 440 (1e-33) E2F6(E2F)/Hela-E2F6-ChIP-Seq(GSE31477)/Homer
                34 of 440 (1e-33) DLX5(Homeobox)/BasalGanglia-Dlx5-ChIP-seq(GSE124936)/Homer
                35 of 440 (1e-32) LHX9(Homeobox)/Hct116-LHX9.V5-ChIP-Seq(GSE116822)/Homer
                36 of 440 (1e-31) E2F4(E2F)/K562-E2F4-ChIP-Seq(GSE31477)/Homer
                37 of 440 (1e-31) EKLF(Zf)/Erythrocyte-Klf1-ChIP-Seq(GSE20478)/Homer
                38 of 440 (1e-31) BMYB(HTH)/Hela-BMYB-ChIP-Seq(GSE27030)/Homer
                39 of 440 (1e-30) DLX2(Homeobox)/BasalGanglia-Dlx2-ChIP-seq(GSE124936)/Homer
                40 of 440 (1e-30) Lhx2(Homeobox)/HFSC-Lhx2-ChIP-Seq(GSE48068)/Homer
                41 of 440 (1e-29) Pitx1(Homeobox)/Chicken-Pitx1-ChIP-Seq(GSE38910)/Homer
                42 of 440 (1e-28) Mef2b(MADS)/HEK293-Mef2b.V5-ChIP-Seq(GSE67450)/Homer
                43 of 440 (1e-28) Elk4(ETS)/Hela-Elk4-ChIP-Seq(GSE31477)/Homer
                44 of 440 (1e-27) DLX1(Homeobox)/BasalGanglia-Dlx1-ChIP-seq(GSE124936)/Homer
                45 of 440 (1e-25) Hoxa11(Homeobox)/ChickenMSG-Hoxa11.Flag-ChIP-Seq(GSE86088)/Homer
                46 of 440 (1e-25) Hoxd13(Homeobox)/ChickenMSG-Hoxd13.Flag-ChIP-Seq(GSE86088)/Homer
                47 of 440 (1e-23) Hoxd11(Homeobox)/ChickenMSG-Hoxd11.Flag-ChIP-Seq(GSE86088)/Homer
                48 of 440 (1e-23) Unknown(Homeobox)/Limb-p300-ChIP-Seq/Homer
                49 of 440 (1e-22) MYB(HTH)/ERMYB-Myb-ChIPSeq(GSE22095)/Homer
                50 of 440 (1e-21) Rfx1(HTH)/NPC-H3K4me1-ChIP-Seq(GSE16256)/Homer
                51 of 440 (1e-19) Nanog(Homeobox)/mES-Nanog-ChIP-Seq(GSE11724)/Homer
                52 of 440 (1e-18) Mef2a(MADS)/HL1-Mef2a.biotin-ChIP-Seq(GSE21529)/Homer
                53 of 440 (1e-18) Arnt:Ahr(bHLH)/MCF7-Arnt-ChIP-Seq(Lo_et_al.)/Homer
                54 of 440 (1e-18) HIF-1b(HLH)/T47D-HIF1b-ChIP-Seq(GSE59937)/Homer
                55 of 440 (1e-17) Gfi1b(Zf)/HPC7-Gfi1b-ChIP-Seq(GSE22178)/Homer
                56 of 440 (1e-17) Egr2(Zf)/Thymocytes-Egr2-ChIP-Seq(GSE34254)/Homer
                57 of 440 (1e-17) Npas4(bHLH)/Neuron-Npas4-ChIP-Seq(GSE127793)/Homer
                58 of 440 (1e-16) CTCF(Zf)/CD4+-CTCF-ChIP-Seq(Barski_et_al.)/Homer
                59 of 440 (1e-14) CEBP(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer
                60 of 440 (1e-14) Tbr1(T-box)/Cortex-Tbr1-ChIP-Seq(GSE71384)/Homer
                61 of 440 (1e-13) Elk1(ETS)/Hela-Elk1-ChIP-Seq(GSE31477)/Homer
                62 of 440 (1e-13) Fli1(ETS)/CD8-FLI-ChIP-Seq(GSE20898)/Homer
                63 of 440 (1e-12) AMYB(HTH)/Testes-AMYB-ChIP-Seq(GSE44588)/Homer
                64 of 440 (1e-12) Tcf12(bHLH)/GM12878-Tcf12-ChIP-Seq(GSE32465)/Homer
                65 of 440 (1e-11) JunD(bZIP)/K562-JunD-ChIP-Seq/Homer
                66 of 440 (1e-11) Oct11(POU,Homeobox)/NCIH1048-POU2F3-ChIP-seq(GSE115123)/Homer
                67 of 440 (1e-11) ETV4(ETS)/HepG2-ETV4-ChIP-Seq(ENCODE)/Homer
                68 of 440 (1e-11) STAT6(Stat)/Macrophage-Stat6-ChIP-Seq(GSE38377)/Homer
                69 of 440 (1e-10) CRX(Homeobox)/Retina-Crx-ChIP-Seq(GSE20012)/Homer
                70 of 440 (1e-10) Ronin(THAP)/ES-Thap11-ChIP-Seq(GSE51522)/Homer
                71 of 440 (1e-9) Rbpj1(?)/Panc1-Rbpj1-ChIP-Seq(GSE47459)/Homer
                72 of 440 (1e-9) MYNN(Zf)/HEK293-MYNN.eGFP-ChIP-Seq(Encode)/Homer
                73 of 440 (1e-9) Hnf1(Homeobox)/Liver-Foxa2-Chip-Seq(GSE25694)/Homer
                74 of 440 (1e-9) Myf5(bHLH)/GM-Myf5-ChIP-Seq(GSE24852)/Homer
                75 of 440 (1e-8) Hoxa9(Homeobox)/ChickenMSG-Hoxa9.Flag-ChIP-Seq(GSE86088)/Homer
                76 of 440 (1e-8) HIF2a(bHLH)/785_O-HIF2a-ChIP-Seq(GSE34871)/Homer
                77 of 440 (1e-8) ELF1(ETS)/Jurkat-ELF1-ChIP-Seq(SRA014231)/Homer
                78 of 440 (1e-8) NFY(CCAAT)/Promoter/Homer
                79 of 440 (1e-8) Hoxc9(Homeobox)/Ainv15-Hoxc9-ChIP-Seq(GSE21812)/Homer
                80 of 440 (1e-8) Zfp57(Zf)/H1-ZFP57.HA-ChIP-Seq(GSE115387)/Homer
                81 of 440 (1e-8) NF1-halfsite(CTF)/LNCaP-NF1-ChIP-Seq(Unpublished)/Homer
                82 of 440 (1e-7) Nkx2.2(Homeobox)/NPC-Nkx2.2-ChIP-Seq(GSE61673)/Homer
                83 of 440 (1e-7) GFY(?)/Promoter/Homer
                84 of 440 (1e-7) MyoD(bHLH)/Myotube-MyoD-ChIP-Seq(GSE21614)/Homer
                85 of 440 (1e-7) Stat3+il21(Stat)/CD4-Stat3-ChIP-Seq(GSE19198)/Homer
                86 of 440 (1e-7) STAT6(Stat)/CD4-Stat6-ChIP-Seq(GSE22104)/Homer
                87 of 440 (1e-7) FoxD3(forkhead)/ZebrafishEmbryo-Foxd3.biotin-ChIP-seq(GSE106676)/Homer
                88 of 440 (1e-6) Rfx6(HTH)/Min6b1-Rfx6.HA-ChIP-Seq(GSE62844)/Homer
                89 of 440 (1e-6) E2F1(E2F)/Hela-E2F1-ChIP-Seq(GSE22478)/Homer
                90 of 440 (1e-6) Prop1(Homeobox)/GHFT1-PROP1.biotin-ChIP-Seq(GSE77302)/Homer
                91 of 440 (1e-6) STAT4(Stat)/CD4-Stat4-ChIP-Seq(GSE22104)/Homer
                92 of 440 (1e-6) Tcf21(bHLH)/ArterySmoothMuscle-Tcf21-ChIP-Seq(GSE61369)/Homer
                93 of 440 (1e-6) GLIS3(Zf)/Thyroid-Glis3.GFP-ChIP-Seq(GSE103297)/Homer
                94 of 440 (1e-6) Eomes(T-box)/H9-Eomes-ChIP-Seq(GSE26097)/Homer
                95 of 440 (1e-6) Stat3(Stat)/mES-Stat3-ChIP-Seq(GSE11431)/Homer
                96 of 440 (1e-6) NRF1(NRF)/MCF7-NRF1-ChIP-Seq(Unpublished)/Homer
                97 of 440 (1e-6) ZNF652/HepG2-ZNF652.Flag-ChIP-Seq(Encode)/Homer
                98 of 440 (1e-5) Rfx5(HTH)/GM12878-Rfx5-ChIP-Seq(GSE31477)/Homer
                99 of 440 (1e-5) RORg(NR)/Liver-Rorc-ChIP-Seq(GSE101115)/Homer
                100 of 440 (1e-5) NFAT(RHD)/Jurkat-NFATC1-ChIP-Seq(Jolma_et_al.)/Homer
                101 of 440 (1e-5) Ap4(bHLH)/AML-Tfap4-ChIP-Seq(GSE45738)/Homer
                102 of 440 (1e-5) Fox:Ebox(Forkhead,bHLH)/Panc1-Foxa2-ChIP-Seq(GSE47459)/Homer
                103 of 440 (1e-5) Pitx1:Ebox(Homeobox,bHLH)/Hindlimb-Pitx1-ChIP-Seq(GSE41591)/Homer
                104 of 440 (1e-5) MyoG(bHLH)/C2C12-MyoG-ChIP-Seq(GSE36024)/Homer
                105 of 440 (1e-5) ETV1(ETS)/GIST48-ETV1-ChIP-Seq(GSE22441)/Homer
                106 of 440 (1e-5) NFIL3(bZIP)/HepG2-NFIL3-ChIP-Seq(Encode)/Homer
                107 of 440 (1e-5) NRF(NRF)/Promoter/Homer
                108 of 440 (1e-5) Fos(bZIP)/TSC-Fos-ChIP-Seq(GSE110950)/Homer
                109 of 440 (1e-4) TEAD1(TEAD)/HepG2-TEAD1-ChIP-Seq(Encode)/Homer
                110 of 440 (1e-4) HLF(bZIP)/HSC-HLF.Flag-ChIP-Seq(GSE69817)/Homer
                111 of 440 (1e-4) Tbx5(T-box)/HL1-Tbx5.biotin-ChIP-Seq(GSE21529)/Homer
                112 of 440 (1e-4) NFkB-p65(RHD)/GM12787-p65-ChIP-Seq(GSE19485)/Homer
                113 of 440 (1e-4) HNF1b(Homeobox)/PDAC-HNF1B-ChIP-Seq(GSE64557)/Homer
                114 of 440 (1e-4) BORIS(Zf)/K562-CTCFL-ChIP-Seq(GSE32465)/Homer
                115 of 440 (1e-4) E2F7(E2F)/Hela-E2F7-ChIP-Seq(GSE32673)/Homer
                116 of 440 (1e-4) CRE(bZIP)/Promoter/Homer
                117 of 440 (1e-4) GABPA(ETS)/Jurkat-GABPa-ChIP-Seq(GSE17954)/Homer
                118 of 440 (1e-4) ERG(ETS)/VCaP-ERG-ChIP-Seq(GSE14097)/Homer
                119 of 440 (1e-4) Smad3(MAD)/NPC-Smad3-ChIP-Seq(GSE36673)/Homer
                120 of 440 (1e-4) HIC1(Zf)/Treg-ZBTB29-ChIP-Seq(GSE99889)/Homer
                121 of 440 (1e-4) HOXB13(Homeobox)/ProstateTumor-HOXB13-ChIP-Seq(GSE56288)/Homer
                122 of 440 (1e-4) Hoxd12(Homeobox)/ChickenMSG-Hoxd12.Flag-ChIP-Seq(GSE86088)/Homer
                123 of 440 (1e-4) c-Myc(bHLH)/LNCAP-cMyc-ChIP-Seq(Unpublished)/Homer
                124 of 440 (1e-4) CArG(MADS)/PUER-Srf-ChIP-Seq(Sullivan_et_al.)/Homer
                125 of 440 (1e-4) TEAD3(TEA)/HepG2-TEAD3-ChIP-Seq(Encode)/Homer
                126 of 440 (1e-4) Oct4:Sox17(POU,Homeobox,HMG)/F9-Sox17-ChIP-Seq(GSE44553)/Homer
                127 of 440 (1e-4) OCT:OCT(POU,Homeobox)/NPC-Brn1-ChIP-Seq(GSE35496)/Homer
                128 of 440 (1e-4) Foxo3(Forkhead)/U2OS-Foxo3-ChIP-Seq(E-MTAB-2701)/Homer
                129 of 440 (1e-4) STAT5(Stat)/mCD4+-Stat5-ChIP-Seq(GSE12346)/Homer
                130 of 440 (1e-4) FOXA1(Forkhead)/LNCAP-FOXA1-ChIP-Seq(GSE27824)/Homer
                131 of 440 (1e-4) HIF-1a(bHLH)/MCF7-HIF1a-ChIP-Seq(GSE28352)/Homer
                132 of 440 (1e-3) PBX2(Homeobox)/K562-PBX2-ChIP-Seq(Encode)/Homer
                133 of 440 (1e-3) Phox2a(Homeobox)/Neuron-Phox2a-ChIP-Seq(GSE31456)/Homer
                134 of 440 (1e-3) ZSCAN22(Zf)/HEK293-ZSCAN22.GFP-ChIP-Seq(GSE58341)/Homer
                135 of 440 (1e-3) Atoh1(bHLH)/Cerebellum-Atoh1-ChIP-Seq(GSE22111)/Homer
                136 of 440 (1e-3) RORa(NR)/Liver-Rora-ChIP-Seq(GSE101115)/Homer
                137 of 440 (1e-3) OCT:OCT(POU,Homeobox,IR1)/NPC-Brn2-ChIP-Seq(GSE35496)/Homer
                138 of 440 (1e-3) PRDM15(Zf)/ESC-Prdm15-ChIP-Seq(GSE73694)/Homer
                139 of 440 (1e-3) Fra2(bZIP)/Striatum-Fra2-ChIP-Seq(GSE43429)/Homer
                140 of 440 (1e-3) PU.1-IRF(ETS:IRF)/Bcell-PU.1-ChIP-Seq(GSE21512)/Homer
                141 of 440 (1e-3) Atf2(bZIP)/3T3L1-Atf2-ChIP-Seq(GSE56872)/Homer
                142 of 440 (1e-3) c-Jun-CRE(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer
                143 of 440 (1e-3) OCT:OCT-short(POU,Homeobox)/NPC-OCT6-ChIP-Seq(GSE43916)/Homer
                144 of 440 (1e-3) CEBP:AP1(bZIP)/ThioMac-CEBPb-ChIP-Seq(GSE21512)/Homer
                145 of 440 (1e-3) Oct4(POU,Homeobox)/mES-Oct4-ChIP-Seq(GSE11431)/Homer
                146 of 440 (1e-3) Fra1(bZIP)/BT549-Fra1-ChIP-Seq(GSE46166)/Homer
                147 of 440 (1e-3) Pit1(Homeobox)/GCrat-Pit1-ChIP-Seq(GSE58009)/Homer
                148 of 440 (1e-3) Atf1(bZIP)/K562-ATF1-ChIP-Seq(GSE31477)/Homer
                149 of 440 (1e-3) E-box(bHLH)/Promoter/Homer
                150 of 440 (1e-2) Smad4(MAD)/ESC-SMAD4-ChIP-Seq(GSE29422)/Homer
                151 of 440 (1e-2) STAT1(Stat)/HelaS3-STAT1-ChIP-Seq(GSE12782)/Homer
                152 of 440 (1e-2) Jun-AP1(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer
                153 of 440 (1e-2) FOXA1(Forkhead)/MCF7-FOXA1-ChIP-Seq(GSE26831)/Homer
                154 of 440 (1e-2) JunB(bZIP)/DendriticCells-Junb-ChIP-Seq(GSE36099)/Homer
                155 of 440 (1e-2) IRF3(IRF)/BMDM-Irf3-ChIP-Seq(GSE67343)/Homer
                156 of 440 (1e-2) ZNF675(Zf)/HEK293-ZNF675.GFP-ChIP-Seq(GSE58341)/Homer
                157 of 440 (1e-2) GSC(Homeobox)/FrogEmbryos-GSC-ChIP-Seq(DRA000576)/Homer
                158 of 440 (1e-2) Brn1(POU,Homeobox)/NPC-Brn1-ChIP-Seq(GSE35496)/Homer
                159 of 440 (1e-2) Fosl2(bZIP)/3T3L1-Fosl2-ChIP-Seq(GSE56872)/Homer
                160 of 440 (1e-2) bHLHE40(bHLH)/HepG2-BHLHE40-ChIP-Seq(GSE31477)/Homer
                161 of 440 (1e-2) E2F(E2F)/Hela-CellCycle-Expression/Homer
                162 of 440 (1e-2) Atf7(bZIP)/3T3L1-Atf7-ChIP-Seq(GSE56872)/Homer
                163 of 440 (1e-2) ETS1(ETS)/Jurkat-ETS1-ChIP-Seq(GSE17954)/Homer
                164 of 440 (1e-2) TEAD(TEA)/Fibroblast-PU.1-ChIP-Seq(Unpublished)/Homer
                165 of 440 (1e-2) bHLHE41(bHLH)/proB-Bhlhe41-ChIP-Seq(GSE93764)/Homer
                166 of 440 (1e-2) Brachyury(T-box)/Mesoendoderm-Brachyury-ChIP-exo(GSE54963)/Homer
                167 of 440 (1e-2) SPDEF(ETS)/VCaP-SPDEF-ChIP-Seq(SRA014231)/Homer
                168 of 440 (1e-2) Smad2(MAD)/ES-SMAD2-ChIP-Seq(GSE29422)/Homer
                169 of 440 (1e-2) IRF4(IRF)/GM12878-IRF4-ChIP-Seq(GSE32465)/Homer
                170 of 440 (1e-2) CDX4(Homeobox)/ZebrafishEmbryos-Cdx4.Myc-ChIP-Seq(GSE48254)/Homer
                171 of 440 (1e-2) Tbet(T-box)/CD8-Tbet-ChIP-Seq(GSE33802)/Homer
                172 of 440 (1e-2) Foxo1(Forkhead)/RAW-Foxo1-ChIP-Seq(Fan_et_al.)/Homer
                173 of 440 (1e-2) Tbx21(T-box)/GM12878-TBX21-ChIP-Seq(Encode)/Homer
                174 of 440 (1e-2) Atf3(bZIP)/GBM-ATF3-ChIP-Seq(GSE33912)/Homer
        Skipping...
        Job finished - if results look good, please send beer to ..

        Cleaning up tmp files...

The Sp1 motif is rank top, and the next motif is highly similar.

drawing

In the next step, we annotate the peaks by their closest genes using bedtools and pybedtools. Make sure they are all previously installed before using.

[14]:

peak_annotation = cc.pp.annotation(peak_data, reference = "mm10")
peak_annotation = cc.pp.combine_annotation(peak_data, peak_annotation)
peak_annotation

In the bedtools method, we would use bedtools in the default path. Set bedtools path by 'bedtools_path' if needed.

[14]:

	Chr	Start	End	Center	Experiment Insertions	Background insertions	Reference Insertions	pvalue Reference	pvalue Background	Fraction Experiment	...	TPH background subtracted	pvalue_adj Reference	Nearest Refseq1	Gene Name1	Direction1	Distance1	Nearest Refseq2	Gene Name2	Direction2	Distance2
0	chr1	3399656	3400345	3400068.0	21	69	8	0.000000e+00	1.745466e-04	0.000015	...	311.849628	0.000000e+00	NM_001011874	Xkr4	-	0	NM_001195662	Rp1	-	890501
1	chr1	3672013	3673193	3672213.0	61	47	9	0.000000e+00	0.000000e+00	0.000045	...	3658.989705	0.000000e+00	NM_001011874	Xkr4	-	-516	NM_001195662	Rp1	-	617653
2	chr1	4773450	4774236	4773657.0	6	5	4	2.405425e-08	1.939280e-04	0.000004	...	353.134811	6.323087e-06	NR_033530	Mrpl15	-	0	NM_008866	Lypla1	+	33657
3	chr1	4785206	4786550	4785472.0	31	47	13	0.000000e+00	2.298502e-08	0.000023	...	1444.712637	0.000000e+00	NR_033530	Mrpl15	-	0	NM_008866	Lypla1	+	21343
4	chr1	5016489	5017564	5017023.0	8	10	8	1.672281e-09	3.938477e-04	0.000006	...	411.032679	4.739980e-07	NM_001290372	Rgs20	-	0	NM_133826	Atp6v1h	+	65522
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
10739	chrX	169325223	169325838	169325423.0	11	16	9	5.107026e-15	1.876575e-04	0.000008	...	524.795662	1.893816e-12	NM_001331049	Hccs	-	-4852	NM_009707	Arhgap6	+	-20784
10740	chrX	169799382	169801491	169799905.0	25	29	23	0.000000e+00	3.271381e-07	0.000018	...	1324.851393	0.000000e+00	NM_010797	Mid1	+	0	NR_003635	4933400A11Rik	-	-19748
10741	chrX	169829316	169830343	169829658.0	9	9	10	2.746225e-09	1.659074e-04	0.000007	...	502.786035	7.671705e-07	NM_010797	Mid1	+	0	NM_001290506	Mid1	+	49276
10742	chrX	169878786	169879561	169879328.0	19	21	10	0.000000e+00	1.379092e-06	0.000014	...	1025.548944	0.000000e+00	NM_010797	Mid1	+	0	NM_001290506	Mid1	+	58
10743	chrY	1009804	1011442	1010850.0	17	2	16	0.000000e+00	7.771561e-16	0.000013	...	1218.868764	0.000000e+00	NM_012011	Eif2s3y	+	0	NR_027507	Tspy-ps	-	44322

10744 rows × 23 columns

Use qbed data, peak data and group names to make a group by peak anndata object.

[15]:

adata_cc = cc.pp.make_Anndata(SP1, peak_annotation, ["P10", "P28"], key = 'group')
adata_cc

100%|██████████| 21/21 [00:06<00:00,  3.31it/s]

[15]:

AnnData object with n_obs × n_vars = 2 × 10744
    var: 'Chr', 'Start', 'End', 'Center', 'Experiment Insertions', 'Background insertions', 'Reference Insertions', 'pvalue Reference', 'pvalue Background', 'Fraction Experiment', 'TPH Experiment', 'Fraction background', 'TPH background', 'TPH background subtracted', 'pvalue_adj Reference', 'Nearest Refseq1', 'Gene Name1', 'Direction1', 'Distance1', 'Nearest Refseq2', 'Gene Name2', 'Direction2', 'Distance2'

Filter peaks by minimum count.

[16]:

cc.pp.filter_peaks(adata_cc, min_counts=5)
adata_cc

[16]:

AnnData object with n_obs × n_vars = 2 × 10744
    var: 'Chr', 'Start', 'End', 'Center', 'Experiment Insertions', 'Background insertions', 'Reference Insertions', 'pvalue Reference', 'pvalue Background', 'Fraction Experiment', 'TPH Experiment', 'Fraction background', 'TPH background', 'TPH background subtracted', 'pvalue_adj Reference', 'Nearest Refseq1', 'Gene Name1', 'Direction1', 'Distance1', 'Nearest Refseq2', 'Gene Name2', 'Direction2', 'Distance2', 'n_counts'

Differential peak analysis will find out the significant binding for each group. In this example, we use the Fisher’s exact test to find out.

[17]:

cc.tl.rank_peak_groups(adata_cc, 'Index', method = 'fisher_exact', key_added = 'fisher_exact')

100%|██████████| 2/2 [00:50<00:00, 25.17s/it]

Plot the differential peak analysis results.

Currently, the peaks are ranked by pvalues. It could also be ranked by logfoldchanges by the following codes:

cc.tl.rank_peak_groups(adata_cc, 'Index', method = 'fisher_exact', key_added = 'fisher_exact',
                       rankby = 'logfoldchanges')
cc.pl.rank_peak_groups(adata_cc, key = 'fisher_exact', rankby = 'logfoldchanges')

[18]:

cc.pl.rank_peak_groups(adata_cc, key = 'fisher_exact')

../../_images/tutorials_notebooks_SP1_bulk_35_0.png

Next, we will visualize differentially bound peaks. The colored datapoints are the insertions specific to a cluster and the grey ones are the total insertions. If we input the backgound file, grey datapoints represent backgound insertions.

[19]:

cc.pl.draw_area("chr16", 42904181, 42922657, 40000, peak_data, SP1, "mm10", adata = adata_cc, font_size=2,
                name = "P10", key = "Index", insertionkey = "group", figsize = (30,10), plotsize = [1,1,4],
                name_insertion1 = 'P10 Insertions', name_density1 = 'P10 Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                peak_line = 3, bins = 600, example_length = 5000, color = "purple", title = "P10")
cc.pl.draw_area("chr16", 42904181, 42922657, 40000, peak_data, SP1, "mm10", adata = adata_cc, font_size=2,
                name = "P28",key = "Index", insertionkey = "group", figsize = (30,10), plotsize = [1,1,4],
                name_insertion1 = 'P28 Insertions', name_density1 = 'P28 Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                peak_line = 3, bins = 600, example_length = 5000, title = "P28")

../../_images/tutorials_notebooks_SP1_bulk_37_0.png

../../_images/tutorials_notebooks_SP1_bulk_37_1.png

[20]:

cc.pl.draw_area("chr8", 64630703, 64690703, 20000, peak_data, SP1, "mm10", adata = adata_cc, font_size=2,
                name = "P10", key = "Index", insertionkey = "group", figsize = (30,10), plotsize = [1,1,4],
                name_insertion1 = 'P10 Insertions', name_density1 = 'P10 Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                bins = 500, peak_line = 8, color = "purple", title = "P10")
cc.pl.draw_area("chr8", 64630703, 64690703, 20000, peak_data, SP1,"mm10", adata = adata_cc, font_size=2,
                name = "P28", key = "Index", insertionkey = "group", figsize = (30,10), plotsize = [1,1,4],
                name_insertion1 = 'P28 Insertions', name_density1 = 'P28 Insertion Density',
                name_insertion2 = 'Total Insertions', name_density2 = 'Total Insertion Density',
                bins = 500, peak_line = 8, title = "P28")

../../_images/tutorials_notebooks_SP1_bulk_38_0.png

../../_images/tutorials_notebooks_SP1_bulk_38_1.png

Plot the volcano plot for differential binding sites.

[21]:

cc.pl.volcano_plot(adata_cc, pvalue_cutoff = 0.05, lfc_cutoff = 2)

../../_images/tutorials_notebooks_SP1_bulk_40_0.png

This is the heatmap for relative calling cards bindings.

[22]:

cc.pl.heatmap(adata_cc, figsize=(15,1))

../../_images/tutorials_notebooks_SP1_bulk_42_0.png

We can see from this plot that P28 has a considerable amount of additional Sp1 binding, as most peaks show increased binding at this timepoint. This is consistent with the experiment design as P28 is the accumulated insertions from day1 to day 28 while P10 reflect the insertions until day 10.

Save the file if needed.

[23]:

adata_cc.write("SP1_qbed.h5ad")