pycallingcards.preprocessing.make_Anndata¶

pycallingcards.preprocessing.make_Anndata(qbed, peaks, barcodes, reference='hg38', key='Barcodes')[source]¶

Make cell(sample) by using peak anndata for calling cards.

Parameters:

qbed (DataFrame) – pd.DataFrame the first five with columns as chromosome, start, end, reads number, direction and barcodes. Chromosome, start, end and barcodes are the actual information needed.
peaks (DataFrame) – pd.DataFrame with first three columns as chromosome, start and end. Other information is contained after these.
barcodes (Union[DataFrame, List]) – pd.DataFrame or a list of all barcodes.
reference (Optional[Literal['hg38', 'mm10', 'sacCer3']] (default: 'hg38')) – [‘hg38’,’mm10’,’sacCer3’]. This information is only used to calculate the length of one insertion. hg38 and mm10 are the same.
key (Union[str, int] (default: 'Barcodes')) – The name of the column in qbed file containing the barcodes information.

Returns:

Annotated data matrix, where observations (cells/samples) are named by their barcode and variables/peaks by Chr_Start_End. The matrix stores the following information.

anndata.AnnData.X - Where the data matrix is stored
anndata.AnnData.obs_names -  Cell(sample) names
anndata.AnnData.var_names -  Peak names
anndata.AnnData.var[‘peak_ids’] -  Peak information from the original file
anndata.AnnData.var[‘feature_types’] -  Feature types

Return type:

AnnData

Example:

>>> import pycallingcards as cc
>>> cc_data = cc.datasets.mousecortex_data(data="qbed")
>>> peak_data = cc.pp.callpeaks(cc_data, method = "test", reference = "mm10",  record = True)
>>> barcodes = cc.datasets.mousecortex_data(data="barcodes")
>>> adata_cc = cc.pp.makeAnndata(cc_data, peak_data, barcodes)