pycallingcards.preprocessing.make_Anndata#

pycallingcards.preprocessing.make_Anndata(qbed, peaks, barcodes, reference='hg38', key='Barcodes')[source]#

Make cell(sample) by using peak anndata for calling cards.

Parameters:
  • qbed (DataFrame) – pd.DataFrame the first five with columns as chromosome, start, end, reads number, direction and barcodes. Chromosome, start, end and barcodes are the actual information needed.

  • peaks (DataFrame) – pd.DataFrame with first three columns as chromosome, start and end. Other information is contained after these.

  • barcodes (Union[DataFrame, List]) – pd.DataFrame or a list of all barcodes.

  • reference (Optional[Literal['hg38', 'mm10', 'sacCer3']] (default: 'hg38')) – [‘hg38’,’mm10’,’sacCer3’]. This information is only used to calculate the length of one insertion. hg38 and mm10 are the same.

  • key (Union[str, int] (default: 'Barcodes')) – The name of the column in qbed file containing the barcodes information.

Returns:

Annotated data matrix, where observations (cells/samples) are named by their barcode and variables/peaks by Chr_Start_End. The matrix stores the following information.

anndata.AnnData.X - Where the data matrix is stored
anndata.AnnData.obs_names - Cell(sample) names
anndata.AnnData.var_names - Peak names
anndata.AnnData.var[‘peak_ids’] - Peak information from the original file
anndata.AnnData.var[‘feature_types’] - Feature types
Return type:

AnnData

Example:

>>> import pycallingcards as cc
>>> cc_data = cc.datasets.mousecortex_data(data="qbed")
>>> peak_data = cc.pp.callpeaks(cc_data, method = "test", reference = "mm10",  record = True)
>>> barcodes = cc.datasets.mousecortex_data(data="barcodes")
>>> adata_cc = cc.pp.makeAnndata(cc_data, peak_data, barcodes)