{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "a7b23dec-b8d3-438f-84d7-3beb08e782cd", "metadata": {}, "source": [ "# Tutorial: SP1 bindings in Cre-driver mouse lines. " ] }, { "attachments": {}, "cell_type": "markdown", "id": "23957f8d-186a-4d95-a223-dc67ddeba63b", "metadata": {}, "source": [ " In this tutorial, we will analyze the binding of the transcription factor Sp1, collected using cre-dependant calling cards from a Syn1::Cre-driver mouse line. Bulk unfused (Brd4 directed) data was also collected as backgound. This dataset contains two time points: day 10(P10) and day 28(P28). The dataset is from [Cammack et al., PNAS. (2020)](https://www.pnas.org/doi/10.1073/pnas.1918241117), and it can be downloaded from [GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128493).\n", " \n", " \n", " In this tutorial, we will call peaks, make annotation, and perfrom differential peak analysis. There are 271946 insertions in the SP1 P10 qbed file, 1083099 insertions in the SP1 P28 qbed file, and 5573110 insertions in the brd4 qbed file." ] }, { "cell_type": "code", "execution_count": 1, "id": "9c51fe44-bebe-4dde-8007-5cd516f40f51", "metadata": {}, "outputs": [], "source": [ "import pycallingcards as cc\n", "import numpy as np\n", "import pandas as pd\n", "import scanpy as sc\n", "from matplotlib import pyplot as plt\n", "plt.rcParams['figure.dpi'] = 150" ] }, { "attachments": {}, "cell_type": "markdown", "id": "669744b0-c0b2-4e07-a421-b128fc03f2b2", "metadata": {}, "source": [ "We start by reading the qbed datafile. In this file, each row represents a Sp1-directed insertion and columns indicate the chromosome, start point and end point, read number, the direction and the sample barcode of each insertion. For example, the first row means one insertion is on Chromosome 1, and starts from 3095378 and ends on 3095382. The reads number is 7 with direction going from 3' to 5'. The barcode of the cell is TAAGG. We give it the group column to distinguish between groups. \n", "\n", "Use ```cc.rd.read_qbed(filename)``` to read your own qbed data." ] }, { "cell_type": "code", "execution_count": 2, "id": "e1c61320-1c1b-4919-8bc8-e7fd730bb6fa", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Chr | \n", "Start | \n", "End | \n", "Reads | \n", "Direction | \n", "Barcodes | \n", "group | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "chr1 | \n", "3095378 | \n", "3095382 | \n", "7 | \n", "+ | \n", "TAAGG | \n", "P10 | \n", "
1 | \n", "chr1 | \n", "3120128 | \n", "3120132 | \n", "1 | \n", "+ | \n", "GTTAC | \n", "P10 | \n", "
2 | \n", "chr1 | \n", "3121275 | \n", "3121279 | \n", "10 | \n", "- | \n", "GTTAC | \n", "P10 | \n", "
3 | \n", "chr1 | \n", "3121275 | \n", "3121279 | \n", "2 | \n", "- | \n", "GTTAC | \n", "P10 | \n", "
4 | \n", "chr1 | \n", "3222947 | \n", "3222951 | \n", "1 | \n", "- | \n", "GTTAC | \n", "P10 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
271941 | \n", "chrY | \n", "1010004 | \n", "1010008 | \n", "1 | \n", "- | \n", "GTTAC | \n", "P10 | \n", "
271942 | \n", "chrY | \n", "1011155 | \n", "1011159 | \n", "12 | \n", "- | \n", "GTTAC | \n", "P10 | \n", "
271943 | \n", "chrY | \n", "1178766 | \n", "1178770 | \n", "10 | \n", "+ | \n", "GTTAC | \n", "P10 | \n", "
271944 | \n", "chrY | \n", "1244787 | \n", "1244791 | \n", "11 | \n", "+ | \n", "GTTAC | \n", "P10 | \n", "
271945 | \n", "chrY | \n", "5433055 | \n", "5433059 | \n", "2 | \n", "+ | \n", "CGAAA | \n", "P10 | \n", "
271946 rows × 7 columns
\n", "\n", " | Chr | \n", "Start | \n", "End | \n", "Reads | \n", "Direction | \n", "Barcodes | \n", "group | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "chr1 | \n", "3071865 | \n", "3071869 | \n", "76 | \n", "+ | \n", "GTCAT | \n", "P28 | \n", "
1 | \n", "chr1 | \n", "3095378 | \n", "3095382 | \n", "7 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
2 | \n", "chr1 | \n", "3102707 | \n", "3102711 | \n", "1 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
3 | \n", "chr1 | \n", "3119905 | \n", "3119909 | \n", "4 | \n", "+ | \n", "GTCAT | \n", "P28 | \n", "
4 | \n", "chr1 | \n", "3120189 | \n", "3120193 | \n", "66 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1083094 | \n", "chrY | \n", "90803579 | \n", "90803583 | \n", "14 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
1083095 | \n", "chrY | \n", "90805130 | \n", "90805134 | \n", "10 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
1083096 | \n", "chrY | \n", "90805130 | \n", "90805134 | \n", "1 | \n", "+ | \n", "CGAAA | \n", "P28 | \n", "
1083097 | \n", "chrY | \n", "90806531 | \n", "90806535 | \n", "5 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
1083098 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "63 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
1083099 rows × 7 columns
\n", "\n", " | Chr | \n", "Start | \n", "End | \n", "Reads | \n", "Direction | \n", "Barcodes | \n", "group | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "chr1 | \n", "3071865 | \n", "3071869 | \n", "76 | \n", "+ | \n", "GTCAT | \n", "P28 | \n", "
1 | \n", "chr1 | \n", "3095378 | \n", "3095382 | \n", "7 | \n", "+ | \n", "TAAGG | \n", "P10 | \n", "
2 | \n", "chr1 | \n", "3095378 | \n", "3095382 | \n", "7 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
3 | \n", "chr1 | \n", "3102707 | \n", "3102711 | \n", "1 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
4 | \n", "chr1 | \n", "3119905 | \n", "3119909 | \n", "4 | \n", "+ | \n", "GTCAT | \n", "P28 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1355040 | \n", "chrY | \n", "90803579 | \n", "90803583 | \n", "14 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
1355041 | \n", "chrY | \n", "90805130 | \n", "90805134 | \n", "10 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
1355042 | \n", "chrY | \n", "90805130 | \n", "90805134 | \n", "1 | \n", "+ | \n", "CGAAA | \n", "P28 | \n", "
1355043 | \n", "chrY | \n", "90806531 | \n", "90806535 | \n", "5 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
1355044 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "63 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
1355045 rows × 7 columns
\n", "\n", " | Chr | \n", "Start | \n", "End | \n", "Reads | \n", "Direction | \n", "Barcodes | \n", "group | \n", "
---|---|---|---|---|---|---|---|
0 | \n", "chr1 | \n", "3071865 | \n", "3071869 | \n", "76 | \n", "+ | \n", "GTCAT | \n", "P28 | \n", "
1 | \n", "chr1 | \n", "3095378 | \n", "3095382 | \n", "7 | \n", "+ | \n", "TAAGG | \n", "P10 | \n", "
2 | \n", "chr1 | \n", "3095378 | \n", "3095382 | \n", "7 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
3 | \n", "chr1 | \n", "3102707 | \n", "3102711 | \n", "1 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
4 | \n", "chr1 | \n", "3119905 | \n", "3119909 | \n", "4 | \n", "+ | \n", "GTCAT | \n", "P28 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1355040 | \n", "chrY | \n", "90803579 | \n", "90803583 | \n", "14 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
1355041 | \n", "chrY | \n", "90805130 | \n", "90805134 | \n", "10 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
1355042 | \n", "chrY | \n", "90805130 | \n", "90805134 | \n", "1 | \n", "+ | \n", "CGAAA | \n", "P28 | \n", "
1355043 | \n", "chrY | \n", "90806531 | \n", "90806535 | \n", "5 | \n", "- | \n", "GTCAT | \n", "P28 | \n", "
1355044 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "63 | \n", "+ | \n", "ACTGC | \n", "P28 | \n", "
1354844 rows × 7 columns
\n", "\n", " | Chr | \n", "Start | \n", "End | \n", "Reads | \n", "Direction | \n", "Barcodes | \n", "
---|---|---|---|---|---|---|
0 | \n", "chr1 | \n", "3004272 | \n", "3004276 | \n", "5 | \n", "+ | \n", "ACTGC | \n", "
1 | \n", "chr1 | \n", "3028063 | \n", "3028067 | \n", "6 | \n", "- | \n", "ACTGC | \n", "
2 | \n", "chr1 | \n", "3043241 | \n", "3043245 | \n", "1 | \n", "- | \n", "ACTGC | \n", "
3 | \n", "chr1 | \n", "3049117 | \n", "3049121 | \n", "1 | \n", "- | \n", "CAGTG | \n", "
4 | \n", "chr1 | \n", "3052152 | \n", "3052156 | \n", "1 | \n", "+ | \n", "ACTGC | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
5573105 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "2 | \n", "+ | \n", "CAGTG | \n", "
5573106 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "1 | \n", "+ | \n", "CAGTG | \n", "
5573107 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "1 | \n", "+ | \n", "CAGTG | \n", "
5573108 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "2 | \n", "+ | \n", "TGACA | \n", "
5573109 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "13 | \n", "+ | \n", "CAGTG | \n", "
5573110 rows × 6 columns
\n", "\n", " | Chr | \n", "Start | \n", "End | \n", "Reads | \n", "Direction | \n", "Barcodes | \n", "
---|---|---|---|---|---|---|
0 | \n", "chr1 | \n", "3004272 | \n", "3004276 | \n", "5 | \n", "+ | \n", "ACTGC | \n", "
1 | \n", "chr1 | \n", "3028063 | \n", "3028067 | \n", "6 | \n", "- | \n", "ACTGC | \n", "
2 | \n", "chr1 | \n", "3043241 | \n", "3043245 | \n", "1 | \n", "- | \n", "ACTGC | \n", "
3 | \n", "chr1 | \n", "3049117 | \n", "3049121 | \n", "1 | \n", "- | \n", "CAGTG | \n", "
4 | \n", "chr1 | \n", "3052152 | \n", "3052156 | \n", "1 | \n", "+ | \n", "ACTGC | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
5573105 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "2 | \n", "+ | \n", "CAGTG | \n", "
5573106 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "1 | \n", "+ | \n", "CAGTG | \n", "
5573107 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "1 | \n", "+ | \n", "CAGTG | \n", "
5573108 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "2 | \n", "+ | \n", "TGACA | \n", "
5573109 | \n", "chrY | \n", "90811001 | \n", "90811005 | \n", "13 | \n", "+ | \n", "CAGTG | \n", "
5572856 rows × 6 columns
\n", "\n", " | Chr | \n", "Start | \n", "End | \n", "Center | \n", "Experiment Insertions | \n", "Background insertions | \n", "Reference Insertions | \n", "pvalue Reference | \n", "pvalue Background | \n", "Fraction Experiment | \n", "TPH Experiment | \n", "Fraction background | \n", "TPH background | \n", "TPH background subtracted | \n", "pvalue_adj Reference | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "chr1 | \n", "3399656 | \n", "3400345 | \n", "3400068.0 | \n", "21 | \n", "69 | \n", "8 | \n", "0.000000e+00 | \n", "1.745466e-04 | \n", "0.000015 | \n", "1549.993948 | \n", "1.238144e-05 | \n", "1238.144320 | \n", "311.849628 | \n", "0.000000e+00 | \n", "
1 | \n", "chr1 | \n", "3672013 | \n", "3673193 | \n", "3672213.0 | \n", "61 | \n", "47 | \n", "9 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "0.000045 | \n", "4502.363372 | \n", "8.433737e-06 | \n", "843.373667 | \n", "3658.989705 | \n", "0.000000e+00 | \n", "
2 | \n", "chr1 | \n", "4773450 | \n", "4774236 | \n", "4773657.0 | \n", "6 | \n", "5 | \n", "4 | \n", "2.405425e-08 | \n", "1.939280e-04 | \n", "0.000004 | \n", "442.855414 | \n", "8.972060e-07 | \n", "89.720603 | \n", "353.134811 | \n", "6.323087e-06 | \n", "
3 | \n", "chr1 | \n", "4785206 | \n", "4786550 | \n", "4785472.0 | \n", "31 | \n", "47 | \n", "13 | \n", "0.000000e+00 | \n", "2.298502e-08 | \n", "0.000023 | \n", "2288.086304 | \n", "8.433737e-06 | \n", "843.373667 | \n", "1444.712637 | \n", "0.000000e+00 | \n", "
4 | \n", "chr1 | \n", "5016489 | \n", "5017564 | \n", "5017023.0 | \n", "8 | \n", "10 | \n", "8 | \n", "1.672281e-09 | \n", "3.938477e-04 | \n", "0.000006 | \n", "590.473885 | \n", "1.794412e-06 | \n", "179.441206 | \n", "411.032679 | \n", "4.739980e-07 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
10739 | \n", "chrX | \n", "169325223 | \n", "169325838 | \n", "169325423.0 | \n", "11 | \n", "16 | \n", "9 | \n", "5.107026e-15 | \n", "1.876575e-04 | \n", "0.000008 | \n", "811.901592 | \n", "2.871059e-06 | \n", "287.105929 | \n", "524.795662 | \n", "1.893816e-12 | \n", "
10740 | \n", "chrX | \n", "169799382 | \n", "169801491 | \n", "169799905.0 | \n", "25 | \n", "29 | \n", "23 | \n", "0.000000e+00 | \n", "3.271381e-07 | \n", "0.000018 | \n", "1845.230890 | \n", "5.203795e-06 | \n", "520.379497 | \n", "1324.851393 | \n", "0.000000e+00 | \n", "
10741 | \n", "chrX | \n", "169829316 | \n", "169830343 | \n", "169829658.0 | \n", "9 | \n", "9 | \n", "10 | \n", "2.746225e-09 | \n", "1.659074e-04 | \n", "0.000007 | \n", "664.283120 | \n", "1.614971e-06 | \n", "161.497085 | \n", "502.786035 | \n", "7.671705e-07 | \n", "
10742 | \n", "chrX | \n", "169878786 | \n", "169879561 | \n", "169879328.0 | \n", "19 | \n", "21 | \n", "10 | \n", "0.000000e+00 | \n", "1.379092e-06 | \n", "0.000014 | \n", "1402.375476 | \n", "3.768265e-06 | \n", "376.826532 | \n", "1025.548944 | \n", "0.000000e+00 | \n", "
10743 | \n", "chrY | \n", "1009804 | \n", "1011442 | \n", "1010850.0 | \n", "17 | \n", "2 | \n", "16 | \n", "0.000000e+00 | \n", "7.771561e-16 | \n", "0.000013 | \n", "1254.757005 | \n", "3.588824e-07 | \n", "35.888241 | \n", "1218.868764 | \n", "0.000000e+00 | \n", "
10744 rows × 15 columns
\n", "\n", " | Chr | \n", "Start | \n", "End | \n", "Center | \n", "Experiment Insertions | \n", "Background insertions | \n", "Reference Insertions | \n", "pvalue Reference | \n", "pvalue Background | \n", "Fraction Experiment | \n", "... | \n", "TPH background subtracted | \n", "pvalue_adj Reference | \n", "Nearest Refseq1 | \n", "Gene Name1 | \n", "Direction1 | \n", "Distance1 | \n", "Nearest Refseq2 | \n", "Gene Name2 | \n", "Direction2 | \n", "Distance2 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "chr1 | \n", "3399656 | \n", "3400345 | \n", "3400068.0 | \n", "21 | \n", "69 | \n", "8 | \n", "0.000000e+00 | \n", "1.745466e-04 | \n", "0.000015 | \n", "... | \n", "311.849628 | \n", "0.000000e+00 | \n", "NM_001011874 | \n", "Xkr4 | \n", "- | \n", "0 | \n", "NM_001195662 | \n", "Rp1 | \n", "- | \n", "890501 | \n", "
1 | \n", "chr1 | \n", "3672013 | \n", "3673193 | \n", "3672213.0 | \n", "61 | \n", "47 | \n", "9 | \n", "0.000000e+00 | \n", "0.000000e+00 | \n", "0.000045 | \n", "... | \n", "3658.989705 | \n", "0.000000e+00 | \n", "NM_001011874 | \n", "Xkr4 | \n", "- | \n", "-516 | \n", "NM_001195662 | \n", "Rp1 | \n", "- | \n", "617653 | \n", "
2 | \n", "chr1 | \n", "4773450 | \n", "4774236 | \n", "4773657.0 | \n", "6 | \n", "5 | \n", "4 | \n", "2.405425e-08 | \n", "1.939280e-04 | \n", "0.000004 | \n", "... | \n", "353.134811 | \n", "6.323087e-06 | \n", "NR_033530 | \n", "Mrpl15 | \n", "- | \n", "0 | \n", "NM_008866 | \n", "Lypla1 | \n", "+ | \n", "33657 | \n", "
3 | \n", "chr1 | \n", "4785206 | \n", "4786550 | \n", "4785472.0 | \n", "31 | \n", "47 | \n", "13 | \n", "0.000000e+00 | \n", "2.298502e-08 | \n", "0.000023 | \n", "... | \n", "1444.712637 | \n", "0.000000e+00 | \n", "NR_033530 | \n", "Mrpl15 | \n", "- | \n", "0 | \n", "NM_008866 | \n", "Lypla1 | \n", "+ | \n", "21343 | \n", "
4 | \n", "chr1 | \n", "5016489 | \n", "5017564 | \n", "5017023.0 | \n", "8 | \n", "10 | \n", "8 | \n", "1.672281e-09 | \n", "3.938477e-04 | \n", "0.000006 | \n", "... | \n", "411.032679 | \n", "4.739980e-07 | \n", "NM_001290372 | \n", "Rgs20 | \n", "- | \n", "0 | \n", "NM_133826 | \n", "Atp6v1h | \n", "+ | \n", "65522 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
10739 | \n", "chrX | \n", "169325223 | \n", "169325838 | \n", "169325423.0 | \n", "11 | \n", "16 | \n", "9 | \n", "5.107026e-15 | \n", "1.876575e-04 | \n", "0.000008 | \n", "... | \n", "524.795662 | \n", "1.893816e-12 | \n", "NM_001331049 | \n", "Hccs | \n", "- | \n", "-4852 | \n", "NM_009707 | \n", "Arhgap6 | \n", "+ | \n", "-20784 | \n", "
10740 | \n", "chrX | \n", "169799382 | \n", "169801491 | \n", "169799905.0 | \n", "25 | \n", "29 | \n", "23 | \n", "0.000000e+00 | \n", "3.271381e-07 | \n", "0.000018 | \n", "... | \n", "1324.851393 | \n", "0.000000e+00 | \n", "NM_010797 | \n", "Mid1 | \n", "+ | \n", "0 | \n", "NR_003635 | \n", "4933400A11Rik | \n", "- | \n", "-19748 | \n", "
10741 | \n", "chrX | \n", "169829316 | \n", "169830343 | \n", "169829658.0 | \n", "9 | \n", "9 | \n", "10 | \n", "2.746225e-09 | \n", "1.659074e-04 | \n", "0.000007 | \n", "... | \n", "502.786035 | \n", "7.671705e-07 | \n", "NM_010797 | \n", "Mid1 | \n", "+ | \n", "0 | \n", "NM_001290506 | \n", "Mid1 | \n", "+ | \n", "49276 | \n", "
10742 | \n", "chrX | \n", "169878786 | \n", "169879561 | \n", "169879328.0 | \n", "19 | \n", "21 | \n", "10 | \n", "0.000000e+00 | \n", "1.379092e-06 | \n", "0.000014 | \n", "... | \n", "1025.548944 | \n", "0.000000e+00 | \n", "NM_010797 | \n", "Mid1 | \n", "+ | \n", "0 | \n", "NM_001290506 | \n", "Mid1 | \n", "+ | \n", "58 | \n", "
10743 | \n", "chrY | \n", "1009804 | \n", "1011442 | \n", "1010850.0 | \n", "17 | \n", "2 | \n", "16 | \n", "0.000000e+00 | \n", "7.771561e-16 | \n", "0.000013 | \n", "... | \n", "1218.868764 | \n", "0.000000e+00 | \n", "NM_012011 | \n", "Eif2s3y | \n", "+ | \n", "0 | \n", "NR_027507 | \n", "Tspy-ps | \n", "- | \n", "44322 | \n", "
10744 rows × 23 columns
\n", "