Figshare+
Browse

Processed data for X-Atlas/Orion: Genome-wide Perturb-seq Datasets via a Scalable Fix-Cryopreserve Platform for Training Dose-Dependent Biological Foundation Models

Version 2 2025-08-25, 21:53
Version 1 2025-06-12, 15:53
dataset
posted on 2025-08-25, 21:53 authored by Ann C HuangAnn C Huang, Ci Chu
<p dir="ltr">This dataset (X-Atlas/Orion) contains processed data from two genome-wide Perturb-seq experiments in HCT116 and HEK293T cell lines described in the <a href="https://www.biorxiv.org/content/10.1101/2025.06.11.659105v1" rel="noreferrer" target="_blank">manuscript</a> <b>X-Atlas/Orion: Genome-wide Perturb-seq Datasets via a Scalable Fix-Cryopreserve Platform for Training Dose-Dependent Biological Foundation Models</b></p><p dir="ltr">Dataset:</p><ul><li>HCT116:</li><li><ol><li><b>HCT116_filtered_dual_guide_cells.h5ad</b>: HCT116 cells that contain two sgRNAs targeting the same gene and from the same guide pair</li><li><b>HCT116_filtered_dual_guide_cells.h5ad.md5</b>: checksum for HCT116_filtered_dual_guide_cells.h5ad</li></ol></li><li>HEK293T:</li><li><ol><li><b>HEK293T_filtered_dual_guide_cell</b><b>s</b><b>.h5ad</b>: HEK293T cells that contain two sgRNAs targeting the same gene and from the same guide pair</li><li><b>HEK293T_filtered_dual_guide_cell</b><b>s</b><b>.h5ad.md5</b>: checksum for HEK293T_filtered_dual_guide_cells.h5ad</li></ol></li></ul><p dir="ltr">h5ads containing all aligned cells to be released at a later date.</p><p dir="ltr">Description of h5ads: h5ads are <a href="https://anndata.readthedocs.io/en/latest/" rel="noreferrer" target="_blank">AnnData</a> objects that contain the following metadata</p><ul><li>cell-level (obs):</li><li><ul><li>sample: GEM batch</li><li>num_features: number of guides</li><li>guide_target: guide identity</li><li>gene_target: gene targeted by guide</li><li>n_genes_by_counts: number of genes with non-zero counts</li><li>total_counts: total UMIs</li><li>total_counts_mt: total UMIs from MT genes</li><li>pct_counts_mt: % UMIs from MT genes</li><li>pass_guide_filter: boolean if cells contains two guides from the same guide pair</li></ul></li><li>gene-level (var):</li><li><ul><li>mt: boolean if gene is MT gene</li><li>n_cells_by_counts: number of cells gene has non-zero UMIs in</li><li>mean_counts: mean UMIs over all cells</li><li>pct_dropout_by_counts: % of cells this gene does not appear in</li><li>total_counts: sum of UMIs for a gene</li></ul></li></ul><p dir="ltr">Other files:</p><ul><li> <b>guide_library.csv</b>: Table containing guide pairs in X-Atlas/Orion. Guide sequences are from <a href="https://elifesciences.org/articles/81856" rel="noreferrer" target="_blank">Replogle, et al. eLife (2022)</a>. Description of columns:</li><li><ul><li>target_gene: gene symbol of target gene</li><li>target_gene_id: Ensembl ID of target gene</li><li>id_a: unique identifier for the first guide in the pair (Guide A). Used in .obs.guide_target</li><li>id_a (Replogle et al): original name of Guide A in Replogle, et al. eLife (2022)</li><li>sequence_a: sequence of Guide A </li><li>id_b: unique identifier for the second guide in the pair (Guide B). Used in .obs.guide_target</li><li>id_b (Replogle et al): original name of Guide B in Replogle, et al. eLife (2022)</li><li>sequence_b: sequence of Guide B</li><li>id_ab: unique identifier for guide pair (id_a | id_b) </li></ul></li></ul><p></p>

History

Research Institution(s)

Xaira Therapeutics

Contact email

ci.chu@xaira.com

I confirm there is no human personally identifiable information in the files or description shared

  • Yes

I confirm the files and description shared may be publicly distributed under the license selected

  • Yes

Competing Interest Statement

The authors are current or former employees of Xaira Therapeutics (A.C.H., T.S.H., J.Z., S.K., E.M.R., S.K.L., I.A., P.W., M.A., K.Y., A.J.L., R.V.S., C.C.) or Foresite Labs (J.M., A.T., C.J.G., H.J.K., A.B.). All have an equity interest in Xaira Therapeutics.

Usage metrics

    Figshare+

    Categories

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC