posted on 2024-05-30, 21:48authored byRyan LynchRyan Lynch, Lillian Padgitt-CobbLillian Padgitt-Cobb, Andrea R. Garfinkel, Brian Knaus, Nolan Hartwick, Nicholas Allsing, Anthony Aylward, Allen Mamerto, Justine Kipruto Kitony, Kelly Colt, Emily Murray, Tiffany Duong, Aaron Trippe, Seth Crawford, Kelly Vining, Todd Michael
<p dir="ltr"><b>Abstract</b></p><p dir="ltr"><i>Cannabis sativa</i> is a globally significant seed-oil, fiber, and drug-producing plant species. However, a century of prohibition has severely restricted legal breeding and germplasm resource development, leaving potential hemp-based nutritional and fiber applications unrealized. Existing cultivars are highly heterozygous and lack competitiveness in the overall fiber and grain markets, relegating hemp to less than 200,000 hectares globally<sup>1</sup>. The relaxation of drug laws in recent decades has generated widespread interest in expanding and reincorporating cannabis into agricultural systems, but progress has been impeded by the limited understanding of genomics and breeding potential. No studies to date have examined the genomic diversity and evolution of cannabis populations using haplotype-resolved, chromosome-scale assemblies from publicly available germplasm. Here we present a cannabis pangenome, constructed with 181 new and 12 previously released genomes from a total of 156 biological samples from both male (XY) and female (XX) plants, including 42 trio phased and 36 haplotype-resolved, chromosome-scale assemblies. We discovered widespread regions of the cannabis pangenome that are surprisingly diverse for a single species, with high levels of genetic and structural variation, and propose a novel population structure and hybridization history. Conversely, the cannabinoid synthase genes contain very low levels of diversity, despite being embedded within a variable region containing multiple pseudogenized paralogs and distinct transposable element arrangements. Additionally, we identified variants of <i>acyl-lipid thioesterase </i>(<i>ALT</i>) genes<sup>2</sup> that are associated with fatty acid chain length variation and the production of the rare cannabinoids, tetrahydrocannabinol varin (THCV) and cannabidiol varin (CBDV). We conclude the <i>Cannabis sativa </i>gene pool has only been partially characterized, and that the existence of wild relatives in Asia remains likely, while its potential as a crop species remains largely unrealized.</p><p dir="ltr">1. Nions, U. Commodities at a glance: Special issue on industrial hemp. <i>Commod Glance</i> (2023) doi:10.18356/9789210019958.</p><p dir="ltr">2. Pulsifer, I. P. <i>et al.</i> Acyl-lipid thioesterase1-4 from Arabidopsis thaliana form a novel family of fatty acyl-acyl carrier protein thioesterases with divergent expression patterns and substrate specificities. <i>Plant Mol. Biol.</i> <b>84</b>, 549–563 (2014).</p><p dir="ltr"><b>Transposable element analysis</b></p><p dir="ltr">To identify transposable elements, we used the EDTA pipeline with default settings. EDTAOutput.tar.gz includes EDTA transposon annotations for 78 scaffolded, chromosome-level cannabis genomes.</p><p dir="ltr"><b>Structural Variation analysis </b></p><p dir="ltr">The 78 fully scaffolded assembly haplotypes were each aligned to the EH23a assembly using minimap2 (Heng Li 2018). Syri was then used to call structural variations on each alignment (Goel et al. 2019) and plotsr was used to visualize alignments and SVs (Goel and Schneeberger 2022). </p><ul><li>DUP_query_coord.bed.tar.gz includes duplications for 78 assemblies with EH23a as reference</li><li>INVTR_query_coord.bed.tar.gz includes inverted translocations for 78 assemblies with EH23a as reference</li><li>INVs_query_coord.bed.tar.gz includes inversions for 78 assemblies with EH23a as reference</li><li>TRANS_query_coord.bed.tar.gz includes translocations for 78 assemblies with EH23a as reference</li></ul><p dir="ltr"><br></p><ul><li>csat_orientations.tsv is a scaffold orientation file for 78 assemblies with EH23a as reference</li></ul><p></p>
To develop high-quality genome assemblies of heterozygous cassava varieties and new tools for pangenome analyses to serve breeding programs that need this detailed genomic understanding for more efficient breeding
I confirm there is no human personally identifiable information in the files or description shared
Yes
I confirm the files and description shared may be publicly distributed under the license selected
Yes
Competing Interest Statement
S.C. was a co-founder of Oregon CBD. A.R.G and A.T. were employees of Oregon CBD. R.C.L is a stakeholder in Saint Vrain Research LLC, which manufactures hemp based products. T.P.M is a founder of the carbon sequestration company CQuesta.