Genome meeting and chromosome identification
A Plantago ovata genome reference was genepriced by using An complete of 5.98 M (7 cells, 40.21 Gb, N50 = 10.45 Kb, 50 bp–121.17 Kb) PacBio sizey studys and 636.5 million (47.74 Gb) Howdy-C brief-studys. PacBio studys have been used to assemble contigs, wright hereas Howdy-C studys have been used To understand chromosome-diploma meeting. The final meeting has 876 sequences (500.94 Mb, N50 = 128.87 Mb) (Desk 1, Supplementary File 1: Desk S2). The 4 superscaffolds account for 97.29% (487.38 Mb) of The complete genome size and the unplaced scaffolds account For Decrease than 2.71% (13.55 Mb). Based mostly on the sizes of the scaffolds, we assignaled HowdyC_scaffold_1 (137.73 Mb) as chromosome 1, HowdyC_scaffold_2 (128.87 Mb) as chromosome 2, HowdyC_scaffold_3 (114.44 Mb) as chromosome 3, and HowdyC_scaffold_4 (106.35 Mb) as chromosome 4 (Supplementary File 2).
Desk 1 Abstract of P. ovata genome meeting and annotation.
We’re assupurple in labelling HowdyC_scaffold_1 as chromosome 1 As a Outcome of of presence of the 5S rDNA cluster11,17 and HowdyC_scaffold_2 as chromosome 2 As a Outcome of it Does not include any 45S rDNA sequences. Solely chromosomes 3 and 4 have 45S rDNA sequences (Fig. 2)11,17. Neverthemuch less, The state of affairs of 45S rDNA on chromosome 3 in our meeting, shut to The center of the chromosome, Is not the identical place as that proposed based on ribosomal bodily mapping11. Earlier evaluationers found 45S rDNA alerts On The tops of the brief arms of chromosomes 3 and 4. This distinction might recurrent intraparticular variation or missed joins Inside the meeting. Greater extreme quality uncooked sizey studys might tackle The duncookedback of misjoined contigs. As properly as, optical mapping know-how Could be used to validate the orientation of the de novo meeting Finally18.
Figure 2
Gene density (blue), TE (Class I and II) density (purple), % GC (purple), distrihoweverion of 5S (blue arrows) and 45S (inexperienced arrows) rRNA Inside the P. ovata genome. The decide was genepriced in R using the karyoploteR library19. The x-axis recurrents genome place (Mb) and the y-axis recurrents gene density using a sliding window Of 1 megabase in size.
Based mostly on the centromere places, P. ovata chromosome 1 Is assessed as metacentric, chromosome 2 as submetacentric wright hereas chromosomes 3 and 4 are subtelocentric11. Neverthemuch less, On this meeting, the place of the centromeres Isn’t rightly fixed however pretty is level outd using euchromatin and heterochromatin patterns. Euchromatin is lively chromatin Inside the genome wright here extra genes are transcribed, wright hereas heterochromatin is a much less lively and extremely condensed area on the chromosome (Fig. 2). Dhar et al.12 reported that euchromatic areas are located On the distal ends of all chromosomes And cover one arm of chromosome 1 fullly. Our end outcomes agree with this But in addition current further information (Fig. 2), defining heterochromatic areas from 60 to 125 Mb on chromosome 1, from 30–105 Mb on chromosome 2, 40–100 Mb on chromosome 3 and 15–80 Mb on chromosome 4. These heterochromatic areas include a extreme density Of sophistication I and II transposable parts (TE) (Fig. 2). The statistics for repeat content material (61.90%, Supplementary File 1: Desk S3) and proportion of complete gene sizes (32.06%) that account for Decrease than one-third of chromosome sizes (Supplementary File 1: Desk S4) assist The sooner discovering using C binding and fluorescence in situ hybridization (FISH) stpricegies indicating that A lot of the areas Inside the P. ovata genome are heterochromatin includeing extremely repetitive DNA12.
Genome measurement
To ppurpleict the P. ovata genome measurement, righted PacBio studys have been used. The Outcome from k-mer evaluation (21-mer) reveals thOn the estimated haploid genome measurement is 551.02 Mb using findGSE v0.1.020 wright hereas genomescope2 v2.021 ppurpleicted 415.78 Mb (Supplementary File 3). Our meeting measurement (500.94 Mb) (Desk 1) sits Infacet the differ of estimated haploid genome measurement using the k-mer method. The P. ovata genome measurement has been beforehand estimated using circulate cytometry and reported in three fullly different research. Badr et al.22 reported diploid P. ovata from Cairo has a genome of between 484.11 Mb (C worth: 0.495 pg) and 523.23 Mb (C worth: 0.535 pg). Pramanik and Raychaudhuri10 studied an Indian cultivar (Anand) and reported a measurement of 537.9 Mb (C worth: 0.55 pg), wright hereas Dhar et al.12 estimated the P. ovata genome measurement at about 621 Mb (C worth: 0.635 pg). Probably the differ in measurements Could be due To make the most of Of numerous stpricegies12 however might furtherly recurrent intraparticular variation. Schmuths et al.23 found vital variations overlaying a 1.1-fold differ between the genome measurement of 21 Arabidopsis accessions.
Genome extreme quality
The P. ovata genome meeting launched right here Is Prime extreme quality as outlined by a quantity of parameters. Comparability between the final meeting and the righted PacBio studys using the KAT comp system (kat v2.4.1)24 affirmed thOn the meeting incorpoprices principally single copy quantitys of the studys (Supplementary File 4). Regardmuch less of having 876 scaffolds, 4 scaffolds with chromosome sizes accounting for 97.29% of The complete haploid genome measurement have been detected and visualised using a Howdy-C interplay heatmap (Supplementary File 2), indicating thOn the meeting Is very contiguous. The shorlook at scaffold size at 50% of The complete genome size (N50) is 128.87 Mb which is chromosome 2 (Desk 1, Supplementary File 2). The scaffold N50 worth Is method extremeer than The typical size of a P. ovata gene at 3,840 bp (Supplementary File 1: Desk S5) indicating a a lot extremeer probability of producing full gene fashions. That is assisted by a BUSCO meeting fullness worth (BUSCO v5.4.3) of 99.3%, wright here Simply one out of 425 genes current in a Viridiplantae cohort (viridiplantae_odb10, creation date: 2020–09-10) is lacking On this meeting (Desk 1). The share of publically out tright here genomic brief-study Illumina knowledge (SRR10076762) mapped to our genome meeting is 95.81%, wright hereas the portion of our genomic sizey-study PacBio knowledge (SRR14643405) mapped again to the meeting is 92.25%. As properly as, the extreme mapping price of studys from RNA-seq knowledge to this meeting (As a lot as 96.10%) will facilitate right knowledge interpretation by stopping false positives in downstream analyses Similar to for transcriptomics (Supplementary File 1: Desk S6)25.
We used the LAI rating To guage the continuity of our meeting wright here This method requires A minimal of 0.1% intact LTR-RTs and 5% LTR-RTs as a proportion of The complete genome measurement26. Ou et al.26 considerd 103 genomes with content materials of intact LTR-RTs Starting from 0.28% to 18.34% and complete quantitys of LTR-RTs from 5.49 to 69.38%. Our meeting meets these standards with 8.38% intact and 52% complete LTR-RTs. This meeting has an LAI rating of 10.27 (Supplementary File 5). Based mostly on the classification of assembled repeat sequences using the LAI rating26, our meeting Could be categorized as a reference (10 (le) LAI (le) 20). Advances in know-how To current sizeyer studys with extremeer accuracy might further enhance The current meeting to gold And even platinum regular.
Of notice is that this meeting has a lower LAI rating (10.27) than the uncooked LAI (15.90) (Supplementary File 5). About 25% (26/103) of genomes studied26 current The identical enhancement. All of these genomes, collectively with our meeting (96.34%), have A complete genome LTR id extremeer than 94%. It has been suggested that these species with current LTR-RT amplifications current extra intact uncooked LTR parts That are thus relaunched by A better LTR id.
Mitochondrial DNA insertions
Three areas On this meeting have been detected as originating from mitochondrial sequences based on contamination screening all by way of genome submission to the NCBI knowledgebase. Two areas are in HowdyC_scaffold_1 (chromosome 1), with one in HowdyC_scaffold_2 (chromosome 2). The sizes are 250 bp, 149 bp, and 177 bp (Supplementary File 6). Neverthemuch less, PacBio sizey studys span these three areas with no breaks suggesting That they are genuinely An factor of the nuclear genome (Supplementary File 6). Michalovova et al.27 equally reported insertions of nuclear mitochondrial DNA (NUMT) and nuclear pfinalid DNA (NUPT) in six plant species. They reported the insertions have been localised shut to centromeres in rice and Arabidopsis. During guide curation of the P. ovata annotation file, genes from the chloropfinal and mitochondria have been found Inside the nuclear meeting, suggesting these three areas are Most probably to be NUMT. Further evaluation Is requipurple To evaluation gene change from organelles to the nuclear genome to characterise NUMT and NUPT in P. ovata.
Repeat content material estimation and identification
The P. ovata genome seems to include 61.90% (310.10 Mb) repeats with sizey terminal repeat (LTR) retrotransposons comprising The very biggest proportion (200.59 Mb, 40.04%) (Supplementary File 1: Desk S3). Two out of three primary teams of LTR retrotransposons have been detected Inside the assembled genome. They are Ty1/Copia (98.64 Mb, 19.69%) and Gypsy (101.64 Mb, 20.29%). Tright here are 366 sequences outlined as satellites (98,9 Kb, 0.02%). Less than 1% (3.49 Mb) of the repeat content material Is simple repeats. Straightforward repeats TTTAGGG recognized as a typical plant telomere sequence11, have been located On The top of all chromosomes, wright hereas AAACCCT, the canonical or reverse complement of the telomere repeat, was found Initially of the chromosomes. Different telomeric variants have been furtherly found On this meeting, Similar to TTTGGGG, TTTCGGG, TTCAGGG, TTTTAGGG and AACCCGG (Supplementary File 7).
GC content material
The gua9 (G) and cytosine (C) content material of DNA has been reported to play An important position in gene regulation And will be Associated to how organisms adapt to their environment28,29. Šmarda et al.28 noticed that crops with GC-rich DNA have been extra adaptative in extreme climates. General, the GC content material of this genome is about 38.4% (Desk 1, Supplementary File 8). Comparability between GC content material, gene density and TE class I and II in 1 Mb-broad sliding house windows affirmed thOn The typical GC content material was extremeer by 3% (40%) Inside The world with extreme TE density As compapurple with areas with extreme gene density (37%) (Fig. 2, Supplementary File 1: Desk S7).
Dhar et al.9 initially said thOn the P. ovata genome had 55% GC content material, adjusting this 4 years later to an AT content material of 59.7%12 and dropping the GC content material to 40.3%. The latest research12 was carried out using circulate cytometry (FCM). Šmarda et al.28,30 compapurple the GC content material of 11 rice species using FCM versus sequence knowledge. They found that GC content materials from sequence knowledge are persistently lower than these from the circulate cytometer. The fullly different stpricegies might Clarify why the GC content material reported by Dhar et al.12 is barely extremeer than our calculation of 38.4%, based on genomic sequences. Neverthemuch less, Dhar et al.12 and this research agree thOn the P. ovata genome is AT-rich. As AT base pairs have lower thermal stability than the GC base pairs28, having low GC content material might signalify thOn the plant is potentially much less adaptive to extreme climates. Wang et al.31 found that plant domestication contrihowevepurple to extremeer A and T content material in maize and soybean As compapurple with their wild relations. Enterprise P. ovata accessions might current The identical enhance in AT content material due to domestication however breeding efforts Have not been as intense On this species as for other crops. To look at this hypothesis, we might measure and examine the GC content material of Australian native Plantago species described in Cowley et al.32 to the commercial accessions of P. ovata.
The GC content material of the CDS, at 44.3%, (Supplementary File 9) is extremeer by 6% As compapurple with genomic GC content material (Desk 1, Supplementary File 8). Kotwal et al.33 furtherly found thOn the GC content material of the P. ovata transcripts Inside their research was extremeer than the genomic GC content material. Neverthemuch less, as they only furthercted and sequenced one tissue type (ovaries) this Might be not A respectable comparison. Kotwal et al.33 furtherly compapurple the GC content material of P. ovata transcripts with A. thaliana, rice, tomato, and Eucalyptus. They categorized P. ovata and Eucalyptus (dicot/eudicot) in The identical group as rice (monocot) with GC content materials of 45–50% wright hereas A. thaliana and tomato (eudicot) had a lower GC content material Starting from 40 to 45%33. Neverthemuch less, P. ovata has a unimodal distrihoweverion (one peak) (Fig. 3 in Kotwal et al.33, Supplementary File 1: Desk S8). In distinction, rice has a bimodal distrihoweverion (two peaks) (Fig. 3 in Kotwal et al.33) So as that they Ought to not be categorized in The identical group. Singh et al.29 studied the GC content material from 20 plant genomes and ranked The very biggest GC content material as coming from grass genomes (collectively with rice), adopted by a non-grass monocot After which finally from eudicots. Their end outcomes furtherly affirmed thOn the eudicot genome has a unimodal distrihoweverion wright hereas grass monocots have a bimodal distrihoweverion29. Bimodal distrihoweverion Is shaped by extremely heterogenous GC content material amongst genes Inside the grass genomes, giving one peak with GC-rich genes and one other with GC-poor genes29. In distinction, eudicots current low variability or homogenous GC content material amongst genes Ensuing in Simply one peak29. Howdygh GC content material has been found to be positively correlated with extreme recombination websites34, Which Could Even be important for breeding stpricegies.
Figure 3
Comparative genomic analyses between P. ovata (PO) with other Laminales species (GA, Genlisea aurea; AM, Antirrhinum majus; EG, Erythranthe guttata; SA, Striga asiatica; PJ, Phtheirospermum japonicum; HI, Handroanthus impetiginosus; SI, Sesamum indicum), one Brassicales (AT, Arabidopsis thaliana) and one Solanales (SL, Solanum lycopersicum). (A) Bar charts for every species in B & C have been aligned to the corresponding species Inside the species tree. Bootstrap worths of the species tree Of every node are one besides N3 is 0.76. (B) Proportion of genes from every species assignaled to orthoteams. (C) The Number of species-particular orthoteams. (D) Venn diagrams of orthoteams from 7 species (GA, PO, AM, EG, SI, and AT). (E) The Number of gene duplication events per inner and terminal nodes from the species-based-phylogenetic tree. (F) Pairwise synteny comparison between P. ovata and A. majus.
Comparative genomic evaluation
The P. ovata genome is estimated to include 41,820 protein-coding genes (Desk 1) based on a set of mRNA transcripts from this organism (Supplementary File 1: Desk S1 & S6), protein homology sequences from related organisms beneath Viridiplantae, and ab initio gene ppurpleiction using MAKER v2.31.1135. Neverthemuch less, only 56% (23,638/41,820) of protein-coding genes have an Annotation Edit Distance (AED) Decrease than 0.5. AED worths differ from 0, with good settlement of the annotation to aligned proof, and 1, with no assisting proof for the annotation. Tright here’s nonethemuch less a lot room To reinformationrce this annotation Finally. Neverthemuch less, use of BUSCO v5.4.336 level outs thOn the fullness of protein-coding genes Continues to be 80.7% (Desk 1).
The protein sequence from the sizeyest transcript variant from every of 23,638 genes was then compapurple with 9 other species using OrthoFinder v2.5.437 (Fig. 3). The species tree in Fig. 3A reveals that A. thaliana (Brassicales) and Solanum lycopersicum (Solanales) have been the outteams of the species in Laminales (Genlisea aurea, Plantago ovata, Antirrhinum majus, Erythranthe guttata, Striga asiatica, Phtheirospermum japonicum, Handroanthus impetiginosus, and Sesamum indicium). P. ovata is shutly related to A. majus as they besizey to The identical household, Plantaginaceae.
OrthoFinder v2.5.437 assignaled 255,025 genes out of 285,170 (89.4%) from 10 species to 22,916 orthoteams (Supplementary File 1: Desk S9). Tright here have been 7,281 orthoteams with all ten species current, and 1,003 Of these consisted fullly of single-copy genes. The imply orthogroup measurement is ten genes (Supplementary File 1: Desk S9). The share of genes from every species assignaled to orthoteams (Fig. 3B) differd from 83.7% to 98.1%, with P. ovata at 90.1% (Supplementary File 1: Desk S10). Tright here have been 5,824 species-particular orthoteams, Starting from 68 orthoteams besizeying to S. indicum to 1,307 orthoteams of S. asiatica. P. ovata has 475 particular orthoteams (Supplementary File 1: Desk S10). These quantitys barely enhanced by wanting only In any respect descendant species from department N3 (Fig. 3D). For event, core orthoteams amongst these seven species have been 9,098, with 590 P. ovata-particular orthoteams. P. ovata shapurple In all probability the most particular orthoteams with A. majus at 97 (Fig. 3D, Supplementary File 1: Desk S11), with 41 single-copy genes from these two species. As compapurple, twenty-three orthoteams Include 1 single P. ovata gene however A quantity of A. majus gene (Supplementary File 1: Desk S11). The most extreme is P. ovata GeneID Pov_00010246, which has 115 orthologs in A. majus. The Number of gene duplication events in A. majus is The very biggest amongst all species studied and Greater than that of P. ovata gene duplication events (11,735/4962 genes, Fig. 3E). The genome measurements of P. ovata (500.94 Mb) and A. majus (510 Mb)38 are comparable. Neverthemuch less, A. majus has eight chromosomes38, double that of P. ovata (4). Using MCscan (jcvi v1.2.11)39, 314 syntenic blocks between P. ovata and A. majus have been detected. These blocks are distrihowevepurple throughout all P. ovata chromosomes: 94 on Chr 1, 81 on Chr 2, 80 on Chr 3, and 59 on Chr 4. Virtually All of the 4 P. ovata chromosomes have syntenic areas to the eight A. majus chromosomes besides tright here Are not any blocks on P. ovata Chr 4 syntenic to A. majus Chr 3 (Fig. 3F). General, about 30% of The complete P. ovata genome Does not correlate to syntenic areas in A. majus (Supplementary File 10). Single P. ovata syntenic blocks that include Simply one A. majus gene account for 50% of the genome, wright hereas 18% of the P. ovata genome has two blocks that correlate to a single A. majus gene. Conversely, a area includeing a single P. ovata gene corresponds To at least one A. majus block throughout 37% of the genome, areas includeing two to 43%, and three To three% of the A. majus genome (Supplementary File 10).
Glycosylchangease household 61 (GT61) household
Upon hydration, P. ovata seeds launch mucilage with physicochemical properties That are decided by polysaccharide complace and molecular construction, notably againbone substitution levels and patterning. P. ovata is rich in complicated heteroxylan17,40,41, composed of a againbone of xylose residues adorned with Quite a Little bit of facet chains typically comprised of arabinose (Ara), xylose (Xyl), and traces of other sugars40,42. We used this current genome meeting to decide candidate genes of the glycosylchangease household 61 (GT61) household, which Appear to encode key enzymes involved in arabinose and xylose substitution43,44 with An monumental influence on final mucilage quantity and extreme quality. Eighteen PoGT61 sequences from public knowledge have been added to the comparative genomic evaluation To hunt for GT61 orthoteams and orthologues.
Public P. ovata GT61 (PoGT61) sequences43,45 have been grouped into three orthoteams, OG0000114, OG0000433, and OG0009221 (Fig. 4, Supplementary File 1: Desk S12). Clades A—C have been labelled as per Anders et al.44 and Voiniciuc et al.46. These orthoteams or clades consist of sequences from 8 species out of 10 studied, wright here OG0009221 (Clade C) has one gene copy for every species, collectively with PoXYLT (Fig. 4). Fifteen of the PoGT61 sequences have been grouped into Clade A wright hereas only two sequences (PoGT61_9 and PoGT61_12) besizeyed to Clade B. PoGT61_1 and PoGT61_1L each mapped to Pov_00033268 wright hereas PoGT61_4 and PoGT61_4L mapped to Pov_00033272 respectively. In distinction, PoGT61_11 and PoGT61_11L mapped to fullly different genes, Pov_00033285 and Pov_00033230, respectively. Neverthemuch less, PoGT61_13 (Clade A) and PoGT61_12 (Clade B) have sequence similarities to A quantity of gene in our meeting (Supplementary File 1: Desk S12). Thus, the previous evaluation of the P. ovata contigs derived from the de novo transcriptome meeting43 was insufficient To utterly resolve The only gene origin Of other splice variants, however this has now been potential using this reference genome.
Figure 4
A phylogenetic tree of GT61 protein sequences from chosen species was visualised using FigTree v1.4.4. Clades A—C have been labelled as per Anders et al.44 and Voiniciuc et al.46.
In complete, tright here are 19 GT61 genes recognized. Nine have been clustepurple on Chr4, 5 on Chr1, three on Chr3, and two on Chr2. The 9 genes located on chromosome 4 are clustepurple Inside the phylogenetic tree. These genes have been ppurpleicted to be xylan arabinosylchangeases (α-1,3-arabinosylchangease) from the annotation file (Supplementary File 1: Desk S12). Heterologous expression Of these genes in other species might affirm their carry out. For event, the heterologous expression of rice and wheat GT61 genes in Arabidopsis enhanced arabinose substitution and currentd obtain-of-carry out proof for arabinosylchangease exercise44. The confacetrably extremeer Number of Plantago GT61 gene duplications has beforehand been suggested to be linked to the extreme density/complicatedity of againbone substitutions on the heteroxylan of P. ovata mucilage43. Different GT61 enzymes may add particular Kinds of heteroxylan againbone decorations, and the heterologous expression of multiple Plantago GT61 genes in tandem, in An relevant host, may reveal such positions.
Non-coding RNA annotations
We recognized 108 ribosomal RNAs (rRNAs), 1,295 change RNAs (tRNAs), and 411 non-coding RNAs (ncRNAs). The recognized non-coding RNAs (ncRNAs) comprise 328 sizey non-coding RNAs (lncRNAs), 17 primary transcripts of microRNAs (miRNAs), 48 small nuclear RNAs (snRNAs), 12 small nucleolar RNAs (snoRNAs), 2 ribonuclease mitochondrial RNA processing (RNase MRP) RNAs, and 4 signal recognition particle (SRP) RNAs. Several Kinds of cytoplasmic rRNA are annotated Inside the genome besizeying to 5S, 18S, and 25S packages. The 5S sequences are clustepurple on chromosome 1 (63 sequences) with only six 5S sequences on chromosome 2, one on chromosome 4 and none on chromosome 3. Ribosomal 45S RNAs are found only on chromosomes 3 and 4 (Fig. 3).
In complete, tright here are 328 lncRNAs Inside the P. ovata genome. They are distrihowevepurple throughout 4 chromosomes with 97 transcripts on chromosome 1, 76 transcripts on chromosome 2, 86 transcripts on chromosome 3, 56 transcripts on chromosome 4, and 13 transcripts on unplaced sequences. Based mostly on The state of affairss of lncRNAs and The shutst mRNAs, we found 320 lncRNA/mRNA pairs Inside the meeting (Supplementary File 1: Desk S13). They Are typically grouped into 4 packages, which are 50 antisense genic, 85 antisense intergenic, 88 sense intergenic, and 97 sense genic.
Miscellaneous annotations
General, all parameters assessed level out that We now have genepriced A Prime extreme quality assembled and annotated genome. The genome Can be make the most ofd as a reference, however we furtherly current Supplementary information Which will revenue future evaluation. Supplementary File 1: Desk S13 incorpoprices Particulars about lncRNA and mRNA candidates for future useful evaluation To consider how gene expression Could Even be managed by epigenetic mechanisms. Supplementary File 11 lists annotation for LTR Copia and Gypsy retrotransposons That Can be useful To consider Plantago domestication. Identified location and sequences of genes linked to histone modifications and DNA methylation Could be Present in Supplementary File 1: Desk S14, offering An further epigenetic useful resource. The telomere sequences in Supplementary File 7 Can be make the most ofd for evolutionary evaluation as suggested Inside the consider by Peska and Garcia47.