Generation and analysis of an Eucalyptus globulus cDNA library constructed from seedlings subjected to low temperature conditions
Maria Cecilia Gamboa
Pablo D.T. Valenzuela
Financial support: This work was partially funded by Universidad Andrés Bello (DI Proyect: 04-05/1) and MIFAB (Proyect: P04-071-F) and by the Microsoft Joint Research Program.
Keywords: cellulose, cold-temperature, EST database, forest biotechnology, lignin.
Note: The sequences have been deposited in GenBank. Accession numbers: ES588357-ES597093
Eucalyptus globulus is the most important commercial temperate hardwood in the world because of its wood properties and due to its characteristics for biofuel production. However, only a very low number of expressed sequence tags (ESTs) are publicly available for this tree species. We constructed a cDNA from E. globulus seedlings subjected to low temperature and sequenced 9,913 randomly selected clones, generating 8,737 curated ESTs. The assembly produced 1,062 contigs and 3,879 singletons forming a Eucalyptus unigene set. Based on BLASTX analysis, 89.3% of the contigs and 88.5% of the singletons had significant similarity to known genes in the non-redundant database of GenBank. The Eucalyptus unigene set generated is a valuable public resource that provides an initial model for genes and regulatory pathways involved in cell wall biosynthesis at low temperature.
Forests cover nearly 30% of the earth surface, nearly 4 billion hectares, serving multiple functions including conservation of biological diversity, renewing the oxygen supply of the atmosphere, preventing soil erosion and supplying pulp and wood (FAO, 2005). Forest tree breeding aims to improve the quality of trees by the selection of individuals with desirable traits that will later be used to produce trees with improved genotype. Genetic improvement programs such as controlled cross-pollination breeding have been used since the 1950s. Nevertheless, phenotype assessment is a complex process due to the long generation times of woody species (Grattapaglia, 2004). It is within the context of reducing this time-frame that functional genomics has become a powerful tool in forestry.
In the last few years functional genomics has been used extensively for gene discovery in species whose genomes have not been completely sequenced. A cost-effective and rapid way to obtain new data from an organism is through partial sequencing of randomly selected cDNA clones (Braütigam et al. 2005). The resulting collection of expressed sequence tags (ESTs) reveals a portion of genes in an organism expressed under a particular condition. Using this approach, several traits have been analyzed in trees, such as wood formation (Allona et al. 1998; Sterky et al. 1998; Israelsson et al. 2003) or cold tolerance (Nanjo et al. 2004; Sterky et al. 2004). Unfortunately, these studies have focused on gene expression profiles having a direct effect on the particular trait studied, without expanding the range of effects that the set condition might have on other metabolic pathways. In fact, cold stress in poplar cuttings (Populus tremula x Populus tremuloides cv. Mush1) has been shown to produce variations in parameters such as sucrose concentration and lignin content, illustrating the direct effect of cold conditions on wood quality (Hausman et al. 2000).
The amount and type of lignin and cellulose are important in the timber and pulp industry as they have a direct effect on the chemical properties of the wood produced by the tree (Jung and Ni, 1998; Fukushima, 2001; Plomion et al. 2001). For the production of biofuels, cellulose needs to be separated from lignin so it can be made available for enzyme hydrolysis. Therefore, several research groups have studied different ways by which to modify lignin and cellulose content on the plant cell wall. As a result, various studies have shown a co-regulation of these two compounds (Hu et al. 1999; Li et al. 2003; Rastogi and Dwivedi, 2006). For instance, the down-regulation of a single lignin biosynthetic gene resulted in a decrease of lignin production by the plant, while exhibiting an increase in cellulose production (Hu et al. 1999). Hence, the modification of plant cell wall composition in trees may provide a way to engineer wood for biofuel production.
E. globulus is considered the most important temperate hardwood plantation species in the world due to its combination of wood properties suitable for the pulp and paper industry (Jones et al. 2002; Grattapaglia, 2004). This tree species has fast growth rates and an ability to adapt to a broad range of geographic locations (ranging from latitude 35ºS to 42ºS), even though its growth rate diminishes due to frost conditions (Jones et al. 2002; Miranda and Pereira, 2002). Most importantly, Eucalyptus has been listed as one of the candidate biomass energy crops by the U.S. Department of Energy (U.S. Department of Energy, 2007). Nevertheless, public genomic information from E. globulus is limited. In fact, an analysis of publicly available E. globulus ESTs at the GenBank EST repository (by July 06, 2007) registered only 3,953 ESTs for E. globulus compared to the mostly represented tree, Pinus taeda (329,469 ESTs). Thus, in this study we provide and describe the first publicly available cDNA library from cold-treated E. globulus seedlings, paying particular attention to genes predicted to be involved in cell wall biosynthesis and the transcription factors suggested to be involved in their regulation).
E. globulus seeds were germinated in a soil mixture and grown in a culture cabinet with a 16 hrs day/8 hs night photoperiod at a temperature of 21ºC. The library was constructed from 3-month old Eucalyptus globulus plants maintained at 4ºC degrees for 30 min. After cold treatment, E. globulus leaves were collected and frozen in liquid nitrogen until use.
Total RNA was extracted according to the method described by Chang and collegues (Chang et al. 1993). RNA integrity was confirmed by gel electrophoresis and 1 mg was quantified using a RNA standard (Invitrogen, Cat 15620-016). Poly (A) mRNA was isolated from total RNA with the Stratagene Poli (A) Quick mRNA Isolation Kit (Stratagene, La Jolla, CA, USA). cDNA was prepared and cloned using the vector pExpress I exploiting the Not I and Eco RV restriction sites. The cDNA library was not normalized, i.e. no attempt was made to reduce the redundancy of highly expressed transcripts.
In total, 9,913 bacterial colonies were randomly picked and single-pass sequence reactions performed. These sequences were analyzed using Phred base calling software (with Q>20) (Ewing et al. 1998). All traces were subjected to a trimming process for the removal of ribosomal RNA, poly (A) tails, low-quality sequences, vector and adapter regions. Sequences with 94% of identity over 40 or more nucleotides were assembled using the CAP3 software (Huang and Madan, 1999).
The unigene set was classified and analyzed according to gene ontology (GO) terms (Ashburner et al. 2000) across functional categories. The complete unigene set was compared against the protein non-redundant database using BLASTX (Altschul et al. 1997) and analyzed with the InterProScan program (Zdobnov and Apweiler, 2001) to assign a putative function. GO terms were extracted from the best hits obtained from the BLASTX comparison against SwissProt-Trembl database (Fleischmann et al. 1999) (E-value < E-15 and >70% of alignment coverage) and compared to the InterProScan GO suggestions. All the GO assignments were curated manually (Ashburner et al. 2000). The unigene dataset was compared to other Eucalyptus cDNA libraries available in Genbank through BlastN program using an e-value cutoff of E-5.
The analysis of the 9,913 sequence-reads resulted in the generation of 1,062 contigs and 3,879 singletons (4,941 unigenes) (Figure 1). The fraction of sequences represented by more than one cDNA was 60.9%, providing an estimate of library redundancy. Based on BLASTX analysis, 89.3% of the contigs and 88.5% of the singletons had a significant similarity to known genes in the non-redundant database (Altschul et al. 1997). As for contigs, their composition ranged from 2 to 118 ESTs. The deepest contigs were considered highly represented unigenes. Those contigs with more than 50 ESTs are shown as Table 1 (contigs with 20 or more ESTs are included as Supplementary data 1).
Overall, 541 unigenes were assigned to biological processes, 449 to cellular component and 493 to molecular function categories. This is a low number of assignments compared to other libraries generated in different studies of trees (Pinus: 5474, 5064 and 5886 respectively; Poplar: 6158, 5751 and 6622 respectively; Spruce: 1697, 1467 and 2188 respectively) (Quackenbush et al. 2000). We suggest that this is due to low average similarity between our database and the uniprot sequences database, in addition to the low alignment coverage obtained (we used both parameters to make the assignments). We focused our analyses on the physiological processes (431) being the most represented process related to cellular metabolism, with 48 unigenes related to alcohol metabolism, 95 unigenes associated to amines and aminoacid derivative metabolism, 116 unigenes involved in transport and 164 related to biosynthetic processes.
The most represented molecular functions corresponded to binding and catalytic activities. The unigenes allocated to binding activity were associated with ion binding (130) and nucleic acids binding (62). Furthermore, 114 unigenes were associated with enzymes involved in redox reactions related to lignin biosynthesis and 88 with tranferase activities, including enzymes involved in lignin and cellulose biosynthesis.
The EST database was screened for sequences with significant similarity to genes involved in the biosynthesis of lignin monomers and cellulose. All of the genes known to participate in the lignin biosynthetic pathway are represented in our cDNA library. Two of the predicted gene products, p-coumarate 3-hydroxylase (C3H) and CoA:shikimate/ quinate hydroxycinnamoyltransferase (HCT) had not been described previously in any Eucalyptus species. However, genes encoding trans-cinnamate 4-hydroxylase (C4H), ferulate 5-hydroxylase (F5H) and 4-coumarate: CoA ligase (4CL) had been described in other Eucalyptus species but not in E. globulus (Harakava, 2004). The remainder of the genes found had been previously described for E. globulus and published in GenBank, including phenylalanine ammonia lyase (PAL), cinnamoyl CoA reductase (CCR), cinnamyl alcohol dehydrogenase (CAD), caffeic acid O-methyltransferase (COMT) and caffeoyl-CoA 3-O-methyltransferase (CCoAOMT) (Figure 2) (Supplementary data 2).
The assembly of the C3H and HCT ESTs showed that two isoforms of their gene-products are represented in our cDNA library. C3H and HCT participate in the process of converting p-coumaryl CoA into caffeoyl-CoA, resulting in the production of coniferyl (G) and sinapyl (S) lignin units. Down-regulation of C3H in transgenic alfalfa plants and Arabidopsis mutants resulted in a significant difference in lignin composition due to an alteration in the number and nature of the monolignol monomers (Franke et al. 2002; Ralph et al. 2006). The characterization of the Arabidopsis reduced epidermal fluorescence (ref8) mutant defective in C3H suggested that the genetic modification of this gene may not be appropriate for the reduction of lignin content in forest species because the mutant plants generated exhibited vascular collapse, developmental abnormalities and increased susceptibility to pathogen attack (Boerjan et al. 2003; Cooke et al. 2004).
Three unigenes exhibited similarity to known cellulose synthase genes. Analysis of their predicted domains by InterProScan revealed that all of them contained the cellulose synthase domain that is composed of three aspartic residues and a QXXRW motif, playing a significant role in the catalytic activity of this enzyme (Krauskopf et al. 2005). However, the zinc finger domains (IPR001841 and IPR011011) present in cellulose synthase proteins were not found in our sequences since the sequences were not full-length. The deduced E. globulus proteins were compared with the ones previously described for E. grandis (Ranik and Myburg, 2006) as no sequences were available for E. globulus (Supplementary data 3).
Of the 56 transcription factor families described in Arabidopsis and 63 in rice (Guo et al. 2005; Gao et al. 2006), 11 of them were represented in our library: auxin/indole-3-acetic acid (AUX/IAA) family, B3 family, basic/helix-loop-helix (bHLH) family, basic leucine zip (bZIP) family, GRAS family, homeodomain-leucine zipper (HD-Zip) (HB) family, heat shock family (HSF), MYB family, WRKY family, zinc finger homeobox (ZF-HD) family and ZIM family. Transcription factors families such as AUX/IAA, MYB and HD containing domains (zinc finger proteins and homeodomain-leucine zipper) regulate the expression of genes that participate in xylem development and secondary wall formation (lignin and cellulose biosynthesis) (Oh et al. 2003; Cánovas et al. 2004).
Many of the genes encoding the enzymes of general phenylpropanoid metabolism, such as PAL, C4H, 4CL, COMT and CAD contain conserved motifs within their promoters that are recognized by plant MYB transcription factors (Tamagnone et al. 1998). Twelve members of the MYB family were found in our library. Some of them had a best BLASTX hit with GOLDEN2-like 1 gene, LHY-CCA1-like 5 gene and DIVARICATA gene. The coverage of the sequences with their best BLASTX hit ranged from 25% and 100%. Two E. gunnii MYB transcription factors sequences were found in GenBank [GenBank: AJ576023- AJ576024] (Goicoechea et al. 2005). Based on BLASTN analysis, these sequences were different from the ones obtained in our library. Others families less represented in our library belonged to the ZF family and bZIP (with seven members each), WRKY family (five members with coverage of their best BLASTX hit between 12% and 50%) and one member of the AUX/IAA family, (Supplementary data 4).
In addition, the data gathered through these analyses was compared with the few existing Eucalyptus cDNA libraries currently found in GenBank. The comparison was made against Eucalyptus gunnii (8,538 ESTs), Eucalyptus globulus subsp. bicostata (2,685 ESTs), Eucalyptus grandis (1,574 ESTs ) and Eucalyptus globulus ‘blue gum’ (1,266 ESTs). BlastN comparisons against our E. globulus database revealed a low level of similarity between our sequenced library and the available datasets. The number of sequences that have at least one match with E-values better than 1E-5 for each library were 1,335 ESTs for E. gunnii (15%), 464ESTs for E. globulus subsp. bicostata (17%), 267 for E. grandis (17%) and 261 ESTs for E. globulus ‘blue gum’ (17%).
In conclusion, a unigene set of approximately 4900 unigenes was obtained from our E. globulus cDNA library. Analysis of its content has provided valuable data for the future metabolic engineering of plant cell walls by identifying new potential targets that will allow future modification for biofuel production and industrial use. In addition, our results will be useful for comparative genomic studies among hardwoods and softwoods.
We thank Dr. Danilo González who provided the computer cluster to generate the unigene set and Dr. David Holmes for critical reading of the manuscript.
ALLONA, I.; QUINN, M.; SHOOP, E.; SWOPE, K.; ST. CYR, S.; CARLIS, J.; RIEDL, J.; RETZEL, E.; CAMPBELL, M.; SEDEROFF, R. and WHETTEN, R. Analysis of xylem formation in pine by cDNA sequencing. Proceedings of the National Academy of Sciences of the United States of America, August 1998, vol. 95, no. 16, p. 9693-9698.
ALTSCHUL, S.; MADDEN, T.; SCHAFFER, A.; ZHANG, J.; ZHANG, Z.; MILLER, W. and LIPMAN, D. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, September 1997, vol. 25, no. 17, p. 3389-3402.
ASHBURNER, M.; BALL, C.; BLAKE, J.; BOTSTEIN, D.; BUTLER, H.; CHERRY, J.; DAVIS, A.; DOLINSKI, K.; DWIGHT, S.; EPPIG, J.; HARRIS, M.; HILL, D.; ISSEL-TARVER, L.; KASARSKIS, A.; LEWIS, S.; MATESE, J.; RICHARDSON, J.; RINGWALD, M.; RUBIN, G. and SHERLOCK, G. Gene Ontology: tool for the unification of biology. Nature Genetics, May 2000, vol. 25, p. 25-29. [CrossRef]
BOERJAN, W.; RALPH, J. and BAUCHER, M. Lignin biosynthesis. Annual Review of Plant Biology, June 2003, vol. 54, p. 519-546. [CrossRef]
BRAÜTIGAM, M.; LINDLÖF, A.; ZAKHRABETKOVA, S.; GHARTI-CHHETRI, G.; OLSSON, B. and OLSSON, O. Generation and analysis of 9792 EST sequences from cold acclimated oat, Avena sativa. BMC Plant Biology, September 2005, vol. 5, p. 18. [CrossRef]
CÁNOVAS, F.; DUMAS-GAUDOT, E.; RECORBET, E.; JORRIN, J.; MOCK, H.-P. and ROSSIGNOL, M. Plant proteome analysis. Proteomics 2004, vol. 4, p.285-298. [CrossRef]
CHANG, S.; PURYEAR, J. and CAIRNEY, J. A simple and efficient method for isolating RNA from pines trees. Plant Molecular Biology Reporter, June 1993, vol. 11, p. 113-116. [CrossRef]
EWING, B.; HILLIER, L.; WENDL, M. and GREEN, P. Basecalling of automated sequencer traces using phred I. Accuracy Assessment. Genome Research, March 1998, vol. 8, no. 3, p. 175-185. [CrossRef]
FLEISCHMANN, W.; MOELLER, S.; GATEAU, A. and APWEILER, R. A novel method for automatic functional annotation of proteins. Bioinformatics, March 1999, vol. 15, no. 3, p. 228-233. [CrossRef]
FRANKE, R.; HUMPHREYS, J.M.; HEMM, M.R.; DENAULT, J.W.; RUEGGER, M.O.; CUSUMANO, J.C. and CHAPPLE, C. Changes in secondary metabolism and deposition of an unusual lignin in the ref8 mutant of Arabidopsis. The Plant Journal, April 2002, vol. 30, no. 1, p. 33-45. [CrossRef]
FUKUSHIMA, K. Regulation of syringyl to guaiacyl ratio in lignin biosynthesis. Journal of Plant Research, February 2001, vol. 114, no. 4, p. 499-508. [CrossRef]
GAO, G.; ZHONG, Y.; GUO, A.; ZHU, Q.; TANG, W.; ZHENG, W.; GU, X.; WEI, L. and LUO, J. DRTF: a database of rice transcription factors. Bioinformatics, March 2006, vol. 22, no. 10, p. 1286-1287. [CrossRef]
GOICOECHEA, M.; LACOMBE, E.; LEGAY, S.; MIHALJEVIC, S.; RECH, P.; JAUNEAU, A.; LAPIERRE, C.; POLLET, B.; VERHAEGEN, D.; CHAUBET-GIGOT, N. and GRIMA-PETTENATI, J. EgMYB2, a new transcriptional activator from Eucalyptus xylem, regulates secondary cell wall formation and lignin biosynthesis. The Plant Journal, August 2005, vol. 43, no. 4, p. 553-567. [CrossRef]
GUO, A.; HE, K.; LIU, D.; BAI, S.; GU, X.; WEI, L. and LUO, J. DATF: a database of Arabidopsis transcription factors. Bioinformatics, February 2005, vol. 21, no. 10, p. 2568-2569.[CrossRef]
HARAKAVA, R. Genes encoding enzymes of the lignin biosynthesis pathways in Eucalyptus. Genetics and Molecular Biology, 2004, vol. 28, no. 3, p. 601-607. [CrossRef]
HAUSMAN, J.F.; EVERS, D.; THIELLEMENT, H. and JOUVE, L. Compared responses of poplar cuttings and in vitro raised shoots to short-term chilling treatments. Plant Cell Reports, October 2000, vol. 19, no. 10, p. 954-960. [CrossRef]
HU, W.J.; HARDING, S.A.; LUNG, J.; POPKO, J.L.; RALPH, J.; STOKKE, D.D.; TSAI, C.J. and CHIANG, V.L. Repression of lignin biosynthesis promotes cellulose accumulation and growth in transgenic trees. Nature Biotechnology, August 1999, vol. 17, p. 808-812. [CrossRef]
ISRAELSSON, M.; ERIKSSON, M.; HERTZBERG, M.; ASPEBORG, H.; NILSSON, H. and MORITZ, T. Changes in gene expression in the wood-forming tissue of transgenic hybrid aspen with increased secondary growth. Plant Molecular Biology, July 2003, vol. 52, no. 4, p. 893-903. [CrossRef]
JONES, R.C.; STEANE, D.A.; POTTS, B.M. and VAILLANCOURT, R.E. Microsatellite and morphological analysis of Eucalyptus globulus populations. Canadian Journal of Forest Research, January 2002, vol. 32, no. 1, p. 59-66. [CrossRef]
JUNG, H.-J. and NI, W. Lignification of plant cell walls. Impact of genetic manipulation. Proceedings of the National Academy of Sciences of the United States of America, October 1998, vol. 95, no. 22, p. 12742-12743. [CrossRef]
KRAUSKOPF, E.; HARRIS, P. and PUTTERILL, J. The cellulose synthase gene PrCESA10 is involved in cellulose biosynthesis in developing tracheids of the gymnosperm Pinus radiata. Gene, May 2005, vol. 350, no. 2, p. 107-116. [CrossRef]
LI, L.; ZHOU, Y.H.; CHENG, X.F.; SUN, J.Y.; MARITA, J.M.; RALPH, J. and CHIANG V.L. Combinatorial modification of multiple lignin traits in trees through multigene cotransformation. Proceedings of the National Academy of Sciences of the United States of America, April 2003, vol. 100, no. 8, p. 4939-4944. [CrossRef]
MIRANDA, I. and PEREIRA, H. Variation of pulpwood quality with provenances and site in Eucalyptus globulus. Annals of Forest Science, 2002, vol. 59, p. 283-291. [CrossRef]
NANJO, T.; FUTAMURA, N.; NISHIGUCHI, M.; IGASAKI, T.; SHINOZAKI, K. and SHINOHARA, K. Characterization of full-length enriched expressed sequence tags of stress-treated poplar leaves. Plant and Cell Physiology, 2004, vol. 45, no. 12, p. 1738-1748. [CrossRef]
OH, S.; PARK, S. and HAN, K.-H. Transcriptional regulation of secondary growth in Arabidopsis thaliana. Journal of Experimental Botany, December 2003, vol. 54, no. 393, p. 2709-2722. [CrossRef]
PLOMION, C.; LEPROVOST, G. and STOKES, A. Wood formation in trees. Plant Physiology, December 2001, vol. 127, p. 1513-1523. [CrossRef]
QUACKENBUSH, J.; LIANG, F.; HOLT, I.; PERTEA, G. and UPTON, J. The TIGR Gene Indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Research, 2000, vol. 28, no. 1, p. 141-145.
RALPH, J.; AKIYAMA, T.; KIM, H.; LU, F.; SCHATZ, P.F.; MARITA, J.M.; RALPH, S.A.; SRINIVASA-REDDY, M.S.; CHEN, F. and DIXON, R.A. Effects of coumarate 3-hydroxylase down-regulation lignin structure. Journal of Biological Chemistry, March 2006, vol. 281, no. 13, p. 8843-8853. [CrossRef]
RASTOGI, S. and DWIVEDI, UN. Down-regulation of lignin biosynthesis in transgenic Leucaena leucocephala harboring O-methyltransferase gene. Biotechnology Progress, 2006, vol. 22, p. 609-616. [CrossRef]
STERKY, F.; REGAN, S.; KARLSSON, J.; HERTZBERG,M.; ROHDE, A.; HOLMBERG, A.; AMINI, B.; BHALERAO, R.; LARSSON, M.; VILLAROEL, R.; VAN MONTAGU,M.; SANDBERG, G.; OLSSON, O.; TEERI, T.; BOERJAN, W.; GUSTAFSSON, P., UHLÉN, M.; SUNDBERG, B. and LUNDEBERG, J. Gene discovery in the wood forming tissues of poplar: analysis of 5,692 expressed sequence tags. Proceedings of the National Academy of Sciences of the United States of America, October 1998, vol.95, no. 22, p. 13330-13335.
STERKY, F.; BHALERAO, R.; UNNEBERG, P.; SEGERMAN, B.; NILSSON, P.; BRUNNER, A.; CHARBONNEL-CAMPAA, L.; LINDVALL, J.; TANDRE, K.;STRAUSS, S.; SUNDBERG, B.; GUSTAFSSON, P.; UHLÉN, M.; BHALERAO, R.; NILSSON, O.; SANDBERG, G.; KARLSSON, J.; LUNDEBERG, J. and JANSSON, S. A populus EST resource for plant functional genomics. Proceedings of the National Academy of Sciences of the United States of America, September 2004, vol. 101, no. 38, p. 13951-13956. [CrossRef]
TAMAGNONE, L.; MERIDA, A.; PARR, A.; MACKAY, S.; CULIANEZ-MACIA, F.; ROBERTS, K. and MARTIN, C. The AmMYB308 and AmMYB330 transcription factors from Antirrhinum regulate phenylpropanoid and lignin biosynthesis in transgenic tobacco. Plant Cell, February 1998, vol. 10, no. 2, p. 135-154.
U.S. DEPARTMENT OF ENERGY. OIT Agriculture Plants/crop-based renewable resources 2020. [cited date July 2007], 2007. Available from Internet: http://www.energy.gov.
Note: Electronic Journal of Biotechnology is not responsible if on-line references cited on manuscripts are not available any more after the date of publication.