SEQUENCE COMPARISON OF MUTE PROMOTERS BETWEEN TWO WHEAT CULTIVARS REVEALED BOTH CONSERVATIVE AND DIVERGED REGIONS

revealed significant subgenome and cross-cultivar heterogeneity of the analyzed promoters. The analysis showed numerous single nucleotide polymorphisms and large deletions in every Natalka sequence compared with Chinese Spring: 276-bp deletion in MUTE-A1 , 205-bp in MUTE-B1 , 1291-bp in MUTE-D1 . These changes affected a range of cis -regulatory elements (light-and stress-responsive elements, as well as tissue-specific elements) within the investigated DNA sequences. The study sheds new light on the regulatory variation of one of the critical factors in stomata biogenesis, widening our understanding of wheat genome plasticity with potential implications for improving the crop's performance under restricted water conditions.


INTRODUCTION
Bread wheat (Triticum aestivum L.) is considered one of the most important cereal crops, with about 200 million hectares allocated for sowing worldwide.However, global climate change, which leads to droughts, increased average annual temperatures, and decreased rainfall, challenges crop productivity (Ortiz et al., 2008).The efficiency of water use by plants is linked with leaf transpiration, which accounts for up to 90% of total plant water loss and is controlled mainly through the number of stomata and stomatal aperture (Duursma et al., 2019).Reducing transpiration moisture loss is considered a promising means of improving water use efficiency and conserving soil water levels (Hepworth et al., 2015;Bi et al., 2018).Biotechnological modulation of stomata density on the leaf surface is the perspective approach for bread wheat improvement concerning drought without impact on yield (Hughes et al., 2017;Serna, 2020).The density of stomata depends on the conversion of leaf meristemoid cells into stomatal structures.This process is controlled by differential expression of stomatal biogenesis genes, such as EPF1, EPF2, MUTE, SPCH, and others (Takata et al., 2013;Caine et al., 2016;Lau et al., 2018;Guo et al., 2021).The promising object for wheat improvement is the gene encoding the MUTE transcription factor (Le et al., 2014;Qi et al., 2017).Guo et al. showed that the expression of MUTE gene initiates in meristemoid cells and ensures the formation of mother guard cells and the transition to guard cells (Guo et al., 2021).Furthermore, the MUTE transcription factor directly controls key genes of stomatal biogenesis and cell cycle genes (Han et al., 2018;Guo et al., 2021).MUTE also plays a specific role in cereals compared to other plants (Raissig et al., 2016(Raissig et al., , 2017)).As an allohexaploid species, T. aestivum has a double set of three subgenomes (A, B, and D).Each subgenome comprises seven pairs of homologous chromosomes, and the entire wheat genome consists of 42 chromosomes (Šramková et al., 2021).On the one hand, sequencing of subgenomes and their further detailed study are significantly complicated because of the complexity of the genome structure and the high level of ploidy.On the other hand, the existence of groups of homologous genes from different subgenomes can make the regulation of gene expression more flexible and manageable.Regulatory sequences of core promoter usually include TATA-box and initiator element (Inr).However, a minimal core promoter may lack a TATA-box, Inr, or other regulatory downstream promoter elements (Schmitz et al., 2022).In the TATA-less core promoter, Inr can compensate for the absence of this sequence and perform its functions (Andersson & Sandelin, 2020).Among other cis-regulatory sequences in the TATA-less proximal promoter, the CAAT-box is common.This regulatory sequence is sensitive to mutations, directly affecting the efficiency of transcription (Porto et al., 2014;Ricci et al., 2019).A distal promoter region contains numerous cis-regulatory elements, including enhancers and silencers, which regulate gene functions in concert with transcription factors (Vo Ngoc et al., 2019).Transcription factors interact with ciselements, which leads to reprogramming transcription patterns (Zou et al., 2011).The observed suppression of single nucleotide polymorphisms (SNPs) frequency within promoters' cis-elements sequences highlights the evolutional and functional importance of preserving these regulatory elements' integrity (Korkuć et al., 2014).The aim of the current work was to perform a comparative characterization of promoter regions of homoeologous genes encoding transcription factor MUTE, regulating stomata development, in three subgenomes of the referent wheat cultivar Chinese Spring (CS) and an Ukrainian cultivar Natalka in order to pave the way for future modulation of stomata biogenesis.

Plant material
The cultivar (cv) Natalka, which originated at the Institute of Plant Physiology and Genetics of the National Academy of Sciences of Ukraine, was selected to Modulation of stomatal biogenesis is one of the means to consider for improving crops' water use efficiency without losing productivity.Transcription factor MUTE is among the critical regulators of stomata biogenesis, particularly of the guard mother cell division.In the current study, we investigated promoter regions of three homoeologous MUTE genes-MUTE-A1, MUTE-B1, and MUTE-D1 from two bread wheat cultivars of different geographic origins.Based on the available sequence of the Chinese Spring genome, MUTE promoters were isolated and sequenced from the Ukrainian cultivar Natalka.Promoter regions of more than 1600 bp upstream of the predicted start codons were cloned as PCR-amplified fragments and analyzed by sequencing.The sequence alignments revealed significant subgenome and cross-cultivar heterogeneity of the analyzed promoters.The analysis showed numerous single nucleotide polymorphisms and large deletions in every Natalka sequence compared with Chinese Spring: 276-bp deletion in MUTE-A1, 205-bp in MUTE-B1, 1291-bp in MUTE-D1.These changes affected a range of cis-regulatory elements (light-and stress-responsive elements, as well as tissue-specific elements) within the investigated DNA sequences.The study sheds new light on the regulatory variation of one of the critical factors in stomata biogenesis, widening our understanding of wheat genome plasticity with potential implications for improving the crop's performance under restricted water conditions.
investigate MUTE promoters' structure.Natalka is a cold-tolerant and droughtsusceptible cultivar with a high average yield compared to the local standards.It is recommended to be grown in forest, forest-steppe and steppe zones of Ukraine.Its grain has fine bread-making quality (Morgun et al., 2014).

Isolation of promoter, sequence amplification, and sequencing
Total DNA was extracted and purified from one grain according to the standard rapid high-yield protocol based on CTAB detergent (Murray & Thompson, 1980).Specific primers for amplification of MUTE promoter sequences (sequences upstream of the predicted start codons) were designed manually based on reference sequences from cv Chinese Spring (IWGSC RefSeq v2.1, https://wheaturgi.versailles.inra.fr)(Alaux et al., 2018).The sequences of primers and their positions are specified in Table 1.Conditions of PCR were the following: denaturation-94°C, 30 sec, hybridization-60°C, 30 sec, elongation-68°C, 1 min.In summary, polymerase chain reaction (PCR) of 20 μl included 0.5 μM of forward and reverse primers each (Metabion, Germany), GCI Reaction Buffer (Takara Biomedical Technology, China) containing 2 mM MgCl2, 0.2 μM of each deoxyribonucleoside triphosphate (Takara Biomedical Technology, China), 1 unit of LA Taq DNA polymerase (Takara Biomedical Technology, China), and 30 ng of total plant DNA.according to the manufacturer instruction.The ligation products were transformed into Escherichia coli dH5α cells following standard calcium chloride protocol (Sambrook & Russell, 2001).Three clones for each PCR reaction were customsequenced (Sangon Biotech, Shanghai, China) using a combination of standard forward and reverse sequencing primers and a range of internal primers.

In silico analysis of promoter regions
The software CLC Main Workbench 6.9.2 (Qiagen, USA) was used for multiple global alignments and pairwise comparisons of reference and experimental sequences.Pairwise sequence comparison presented values of distance and percent identity.The distance was given as the proportion between identical and overlapping alignment positions between the two sequences using the Jukes-Cantor correction.Percent identity represented the percentage of identical residues in alignment positions to overlapping alignment positions between two sequences.
The promoter sequences were processed with the online tool New PLACE version 30.0 to identify cis-regulatory elements among 469 database entries (Higo et al., 1999).

Intra-and inter-genomic sequence polymorphism of MUTE promoter in wheat cultivars Chinese Spring and Natalka
Our preliminary study on the expression of MUTE genes (MUTE-A1, MUTE-B1, and MUTE-D1) revealed diversified patterns among Ukrainian cultivars (data not published).This observation inspired us to investigate the variability of the genes' promoter regions.The reference genome of the cultivar CS (IWGSC RefSeq v2.1) was taken as a template for subgenome polymorphism evaluation.As a result, we observed substantial subgenome diversity in the promoter regions upstream of the transcription start compared to the protein-coding downstream sequences.The primers positioning aimed to isolate large promoter regions for comparing the similarity and variability of the sequences in wheat subgenomes.In consequence, primer pairs were designed to isolate sequences upstream of the predicted start codons based on CS genome: (i) 3659-bp of MUTE-A1, (ii) 1647bp of MUTE-B1, and (iii) 2585-bp of MUTE-D1.The extracted from the database and aligned promoter regions of CS MUTE-A1, MUTE-B1, and MUTE-D1 genes, along with the positions of primers used for the isolation, are illustrated in Figure 1A.The alignment displayed that the region of approximately 300 bp upstream of the start codon is relatively conserved among the three sequences (yellow color).
It is followed by the variable region of about 2000 bp with a high rate of indels (shades of red color).However, it showed some conservation among the subgenome sequences in the left part of the alignment (the most upstream region from the predicted start codon).These similarities may seem comparable to systems of transcription initiation (conserved region adjacent to start codon) and regulation (conserved left upstream-aligned section).The sequencing data had been deposited in the GenBank under accession numbers OP913457 (MUTE-A1 promoter region, cultivar Natalka), OP913458 (MUTE-B1 promoter region, cultivar Natalka) and OP913459 (MUTE-D1 promoter region, cultivar Natalka).The isolated Natalka homoeologous sequences have a similar pattern of conserved sections.In contrast, the sequence adjacent to the start codon is two times shorter in comparison with CS sequences (well-aligned approximately 150 bp, primarily yellow in Figure 1B).The summary of pairwise comparison between six studied sequences from CS and Natalka, including distances and percent identity, is represented in Figure 1C.In the current case study, we detected insignificant conservation, which was denoted in 1285-bp deletion, 28-bp deletion, and a series of SNPs.Such differences in the level of variation between experimental and reference sequences might be related to different homoeologous chromosome groups: MUTE genes are allocated on chromosome group 2, while TaWRKY2 genes belong to chromosome group 1.

Survey of regulatory DNA-elements in wheat MUTE promoters
To get deeper insights into the potential regulation of MUTE genes in wheat, the revealed six promoter sequences were subjected to detailed analysis of cisregulatory motifs employing the search of 469 entries registered in the New PLACE database (Higo et al., 1999).The search demonstrated that each studied homoeologous MUTE promoter region had a TATA-less proximal promoter containing both the initiator element (Inr-element) and CAAT-box.CS and Natalka sequences had an equal number of Inr-elements (9 in MUTE-A1, and 4 in MUTE-B1 and MUTE-D1) with one Inr-element in the section of the proximal promoter (positions -156…-166).One copy of the central regulator of transcription efficiency CAAT-box was also positioned in the proximal promoter (positions -204…-214) upstream of the start codon in every studied MUTE promoter of two cultivars.Thus, the homoeologues proximal promoters of MUTE genes had related structures due to the conservation of up to 300 bp section upstream of the start codon.Distal promoter regions also contained abundant CAAT-boxes: 44 and 40 in MUTE-A1 of CS and Natalka, respectively; 12 in MUTE-B1 of both cultivars; then 16 and 11 in MUTE-D1 of CS and Natalka, respectively.The rest of the cis-regulatory motifs were divided into five groups: stress responsiveness, compound-responsive elements, light regulation, tissue-specific regulation, and specific pathways (Table 2).The entire promoter region included various cis-regulatory motifs responsible for diverse families of stress-related transcription factors.Accordingly, numerous W-boxes, MYC and MYB recognition sites, ARR1-binding elements, LTRE elements, and CBFs were spotted.Four types of W-box were identified in the studied sequences-TGAC, TGACY, TTGAC, and TGACT.Motif density (Table 2) was calculated to describe the total abundance of cis-acting elements and to evaluate the significance of lost fragments in cv Natalka.Only in the case of MUTE-D1, we observed the decline of motif density from 0.168 in CS to 0.130 in cv Natalka even though the reference sequence contained not wellresolved section of 417 bp (-811..-1227) in mentioned large indel which did not contribute to the number of motifs.Vice versa, the detected alteration resulted in the growth of this characteristic for MUTE-A1 and MUTE-B1, supposing the lost fragments were not crucial and did not have a high capacity of elements.The motif density is higher in the MUTE-A1 promoter region than in other subgenomes, which may denote more flexible or even more complex regulation of expression.

CONCLUSION
Sequence comparison of subgenomic promoter regions of homoeologues MUTE genes between two wheat cultivars revealed regions of both high similarity and significant divergence.The promoter sequence similarities between cv Chinese Spring and Natalka are of different geographic origins and likely represent fundamental principles of the gene's transcription initiation and regulation.Simultaneously, the considerable deletions revealed in the promoter from cv Natalka compared to Chinese Spring, demonstrate opportunities for significant transcriptional flexibility in regulating stomata biogenesis between wheat cultivars.
Gene expression is governed by a multi-level regulation system with gene promoters playing a pivotal role at the stage of DNA transcription into messenger RNA.DNA elements around the transcription initiation start point are bound with specific transcription factors to control selective gene transcription (Das & Bansal, 2019; Andersson & Sandelin, 2020).The typical eukaryotic promoter consists of proximal and distal regions.The proximal promoter region (from -250 to +250) contains core and proximal promoter elements (Porto et al., 2014).The core promoter in plants, as well as in other eukaryotic organisms, has basic cis-elements, which are the binding sites for subunits of RNA polymerase II and other proteins associated with the transcription complex (Andersson & Sandelin, 2020).

Figure 1
Figure 1 Promoter sequence comparison of MUTE genes from subgenomes A (MUTE-A1), B (MUTE-B1), and D (MUTE-D1) of cultivars Chinese Spring and Natalka. A. Homoeologous sequence alignment of the reference cultivar Chinese Spring (IWGSC RefSeq v2.1).Green arrows mark the position of primers used for PCR amplification and sequencing of promoter regions.B. Homoeologous sequence alignment of cultivar Natalka.Conservation graphs indicate variable-conserved (from dark red to yellow) sections of these regions.C. Pairwise comparison of homoeologues promoter region from CS and Natalka includes distance (above the diagonal) and percent identity (below the diagonal).
Tetranucleotide TGAC was the most abundant among them-12 in MUTE-A1, 17 in MUTE-B1, and 18 in MUTE-D1 promoter regions of CS.The divergence of Natalka sequences resulted in the reduction of the number of TGAC W-boxes-11 in MUTE-A1 (Figure 2), 12 in MUTE-B1 (Figure 3), and 9 in MUTE-D1 (Figure 4).The other three W-box motifs had a similar pattern of declining amounts in Natalka.Only the SNP at position -339 of MUTE-A1 contributed to the formation of a new cis-regulatory W-box, eliminating the CCAAT-box.Related to abiotic stress responsiveness, the MYC recognition site CANNTG was abundant within the reviewed sequences.MUTE-A1 of CS has 36 MYC motifs.One-bp insertion at the position -716…-717 led to the elimination of two motifs on the plus and minus strands of the Natalka sequence.MUTE-B1 and MUTE-D1 sequences had 28 and 26 motifs equal in both cultivars.Though substitution T for C (-1795) eliminated two MYC motifs, substitution C for T set up two of them in MUTE-D1 of cv Natalka.Alterations also led to the elimination of response regulator ARR1-binding elements (NGATT motif) on homoeologous sequences.Thus, CS MUTE-A1 has 23 versus 22 motifs in cv Natalka: 1-bp insertion (-716…-717) set formation and substitutions T for C (-1446), as well as C for T (-2265), led to eliminations of two ARR1-binding elements.The 205-bp deletion (-401…-605) eliminated two ARR1 motifs in MUTE-B1 as well as 1285bp deletion (-410…-1694) eliminated six of them in MUTE-D1 of cv Natalka.The TGTCA motif of disease resistance response element BIHD1OS was also variable between CS and Natalka sequences.As a result, 276-bp deletion (-1866…-2141) removed one element from MUTE-A1 (six remained); 205-bp deletion (-401… -605) eliminated two of eight BIHD1OS elements in MUTE-B1; and 1285-bp deletion (-410…-1694) eliminated two and substitution C for T at -2392 created one BIHD1OS motif in MUTE-D1 (totally 5 versus 6 in CS).The MYB recognition site (YAACKG motif), which is specific for comprehensive abiotic stress responsiveness, including drought (Yang et al., 2016), did not show much variability among sequences.Both CS and Natalka had five such elements within MUTE-A1 and two within MUTE-B1.The substitution G for T set up the fifth additional site in MUTE-D1 of cv Natalka.A low-temperature responsive element (LTRE element) with consensus motif CCGAC was presented in one copy in the MUTE-B1 sequence both in CS and Natalka.The SNP A for G at position -219 led to the formation of the single motif in the MUTE-D1 promoter in cv Natalka.The single copy of dehydration-responsive element CBF (RYCGAC motif) was removed due to 205-bp deletion (-401…-605) MUTE-B1.

Legend:
Figure 2 MUTE-A1 promoter region alignment of the reference Chinese Spring and the isolated Natalka sequences denotes variability in sequence and stress-responsive cis-acting elements.Frames point to the diverse composition of cis-elements between two sequences.Conservation graphs indicate variable-conserved (from dark red to yellow) sections of these regions.

Figure 3
Figure 3 MUTE-B1 promoter region alignment of the reference Chinese Spring and the isolated Natalka sequences denotes variability in sequence and stress-responsive cis-acting elements.Frames point to the diverse composition of cis-elements between two sequences.Conservation graphs indicate variable-conserved (from dark red to yellow) sections of these regions.

Figure 4
Figure 4 MUTE-D1 promoter region alignment of the reference Chinese Spring and the isolated Natalka sequences denotes variability in sequence and stress-responsive cis-acting elements.Frames point to the diverse composition of cis-elements between two sequences.Conservation graphs indicate variable-conserved (from dark red to yellow) sections of these regions; the black section shows not well-resolved region in Chinese Spring.The composition of all three MUTE proximal promoters, which contains both initiator element (Inr-element) and CAAT-box, corresponds with the general structure of promoters in eukaryotic organisms, in particular plants(Komarnytsky & Borisjuk, 2003;Porto et al., 2014).The homoeologues proximal promoters of MUTE genes had related structures due to the conservation of up to 300-bp section before the start codons.Hypothetically, expression of MUTE-A1 and MUTE-D1 might be less upregulated in Natalka than in CS due to reduced copies of the CAAT-box.Also, numerous TATA-based enhancers, which are AT-rich sequences located at different distances from the core promoter, can contribute to the expression efficiency(Singer et al., 1990;Komarnytsky & Borisjuk, 2003).Considering the abundance of various types of cis-acting elements, we distinguish that stress-responsive elements, elements for light regulation, and the few representatives of tissue-specific elements contributed more to the total number.Thus, W-boxes are binding sites of WRKY proteins, an extensive family of plant transcription factors that play a crucial role in stress responses, particularly in bread wheat(Ning et al., 2017;Baillo et al., 2020).The CANNTG motif was described as a consensus MYC recognition site in Arabidopsis dehydration and coldresponsive genes(Higo et al., 1999).In wheat, MYC transcription factors play a role in plant development and stress response.The CANNTG motif was highly representative across three studied sequences, which denoted a complex regulation network.The multiple calmodulin-binding CGCG-boxes were found to be richly represented across every studied MUTE promoter region with higher density in MUTE-D1.That pointed to the dependence of expression regulation from the Ca 2+based secondary signals indirectly guiding plant development and stress responses (T.Yang & Poovaiah, 2002; F. Yang et al., 2020).The studied sequences contained a notable amount of light-responsive elements, common for plants, especially for their sequences to be expressed in leaves (Komarnytsky & Borisjuk, 2003; Libantova et al., 2021).It was reported that widespread GT-binding sites were found in a broad range of plant promoters(Zhou, 1999).The cooperation of GT-1 and GATA-box may lead to advanced responsiveness to a broader light spectrum(Puente et al., 1996;Chattopadhyay et al., 1998).The tissue-specific and highly widespread tetranucleotide YACT (CACT-element) was described as the motif denoting mesophyll-specific expression in C4 plants.However, CACT was also supposed to be a cis-regulatory element in C3 plants but with no cell specificity in reference C3 promoter from Flaveria pringlei(Gowik et al., 2004).The expression of wheat DNA-binding with one-finger (DOF) proteins exhibits specific patterns in different organs and developmental stages and is impacted by various abiotic stresses(Yanagisawa, 2004;Fang et al., 2020;Liu et al., 2020).The binding motif for DOF proteins(Yanagisawa & Schmidt, 1999) was quite noticeable, especially in the MUTE-A1 promoter region, potentially contributing to the plasticity of expression regulation.The CCAAT-box binds with transcription factors named Nuclear Factor Y.They are conserved among eukaryotes and play a fundamental biological role in plants(Qu et al., 2015;Yadav et al., 2015; Chaves-Sanjuan et al., 2021).Some of these TFs are expressed ubiquitously, and others are expressed in an environmental-or tissue-specific manner in bread wheat(Stephenson et al., 2007;Khurana et al., 2013;Qu et al., 2015).Motif density (Table2) was calculated to describe the total abundance of cis-acting elements and to evaluate the significance of lost fragments in cv Natalka.Only in the case of MUTE-D1, we observed the decline of motif density from 0.168 in CS to 0.130 in cv Natalka even though the reference sequence contained not wellresolved section of 417 bp (-811..-1227) in mentioned large indel which did not contribute to the number of motifs.Vice versa, the detected alteration resulted in the growth of this characteristic for MUTE-A1 and MUTE-B1, supposing the lost fragments were not crucial and did not have a high capacity of elements.The motif density is higher in the MUTE-A1 promoter region than in other subgenomes, which may denote more flexible or even more complex regulation of expression.

Table 1
The characteristics of the designed primers used for promoters isolation Following PCR, the amplified fragments were separated with agarose gel electrophoreses in 1×TAE running buffer (40 mM Tris-acetate, 1 mM EDTA), purified from agarose with AxyPrep DNA Gel Extraction Kit (Axygen Scientific, USA) and ligated into the vector pMD-19(Takara Biomedical Technology, China) The comparison shows that pairs CS/Natalka MUTE-A1 and CS/Natalka MUTE-B1 had the highest percent identity values(91.45and87.14,respectively).Due to changes, pair CS/Natalka MUTE-D1 did not have a significant number of percent identity.Natalka MUTE-D1 demonstrated higher similarity with sequences from subgenome B of both cultivars (Natalka MUTE-.Every obtained Natalka sequence was pairwise aligned with CS reference to reveal cross-cultivar polymorphism.Numerous cultivar-specific SNPs and several indels were indicated between the sequences originated from reference and experimental cultivars.In particular, comparing experimental sequences (cv Natalka) with reference (cv CS), we observed 35 SNPs and one 276-bp deletion in the MUTE-A1 promoter region; 7 SNPs and one 205-bp deletion in the MUTE-B1 promoter (Lakhneko et al., 2021) deletion, a 1285-bp deletion, and a 28-bp insertion were spotted across MUTE-D1 promoter region.Counting SNPs, we included small 1-2-bp indels.Every studied Natalka sequence had a large deletion compared with the reference; additionally, Natalka's upstream MUTE-D1 region had a complementary minor 28-bp insertion.Apparently, the regulation of MUTE genes in the genome of Natalka is less flexible in contrast to Chinese Spring due to the elimination of numerous cis-regulatory elements.Having evaluated previous studies on plant promoter region diversity, it was notably spotted that Natalka sequences of promoter and coding regions of drought-related TaWRKY2-D1 (from subgenome D) gene were utterly identical with CS(Lakhneko et al., 2021).

Table 2
The collection of most common and most represented cis-regulatory elements in the promoter of homoeologues MUTE genes of two cultivars