pycallingcards.preprocessing.make_Anndata¶
- pycallingcards.preprocessing.make_Anndata(qbed, peaks, barcodes, reference='hg38', key='Barcodes')[source]¶
Make cell(sample) by using peak anndata for calling cards.
- Parameters:
qbed (
DataFrame) – pd.DataFrame the first five with columns as chromosome, start, end, reads number, direction and barcodes. Chromosome, start, end and barcodes are the actual information needed.peaks (
DataFrame) – pd.DataFrame with first three columns as chromosome, start and end. Other information is contained after these.barcodes (
Union[DataFrame,List]) – pd.DataFrame or a list of all barcodes.reference (
Optional[Literal['hg38','mm10','sacCer3']] (default:'hg38')) – [‘hg38’,’mm10’,’sacCer3’]. This information is only used to calculate the length of one insertion. hg38 and mm10 are the same.key (
Union[str,int] (default:'Barcodes')) – The name of the column in qbed file containing the barcodes information.
- Returns:
Annotated data matrix, where observations (cells/samples) are named by their barcode and variables/peaks by Chr_Start_End. The matrix stores the following information.
anndata.AnnData.X - Where the data matrix is storedanndata.AnnData.obs_names - Cell(sample) namesanndata.AnnData.var_names - Peak namesanndata.AnnData.var[‘peak_ids’] - Peak information from the original fileanndata.AnnData.var[‘feature_types’] - Feature types- Return type:
- Example:
>>> import pycallingcards as cc >>> cc_data = cc.datasets.mousecortex_data(data="qbed") >>> peak_data = cc.pp.callpeaks(cc_data, method = "test", reference = "mm10", record = True) >>> barcodes = cc.datasets.mousecortex_data(data="barcodes") >>> adata_cc = cc.pp.makeAnndata(cc_data, peak_data, barcodes)