pycallingcards.preprocessing.make_Anndata#
- pycallingcards.preprocessing.make_Anndata(qbed, peaks, barcodes, reference='hg38', key='Barcodes')[source]#
Make cell(sample) by using peak anndata for calling cards.
- Parameters:
qbed (
DataFrame
) – pd.DataFrame the first five with columns as chromosome, start, end, reads number, direction and barcodes. Chromosome, start, end and barcodes are the actual information needed.peaks (
DataFrame
) – pd.DataFrame with first three columns as chromosome, start and end. Other information is contained after these.barcodes (
Union
[DataFrame
,List
]) – pd.DataFrame or a list of all barcodes.reference (
Optional
[Literal
['hg38'
,'mm10'
,'sacCer3'
]] (default:'hg38'
)) – [‘hg38’,’mm10’,’sacCer3’]. This information is only used to calculate the length of one insertion. hg38 and mm10 are the same.key (
Union
[str
,int
] (default:'Barcodes'
)) – The name of the column in qbed file containing the barcodes information.
- Returns:
Annotated data matrix, where observations (cells/samples) are named by their barcode and variables/peaks by Chr_Start_End. The matrix stores the following information.
anndata.AnnData.X - Where the data matrix is storedanndata.AnnData.obs_names - Cell(sample) namesanndata.AnnData.var_names - Peak namesanndata.AnnData.var[‘peak_ids’] - Peak information from the original fileanndata.AnnData.var[‘feature_types’] - Feature types- Return type:
- Example:
>>> import pycallingcards as cc >>> cc_data = cc.datasets.mousecortex_data(data="qbed") >>> peak_data = cc.pp.callpeaks(cc_data, method = "test", reference = "mm10", record = True) >>> barcodes = cc.datasets.mousecortex_data(data="barcodes") >>> adata_cc = cc.pp.makeAnndata(cc_data, peak_data, barcodes)