Special Reviews

Gene Discovery and Product Development for Grain Quality Traits

See allHide authors and affiliations

Science  16 Jul 1999:
Vol. 285, Issue 5426, pp. 372-375
DOI: 10.1126/science.285.5426.372


The composition of oils, proteins, and carbohydrates in seeds of corn, soybean, and other crops has been modified to produce grains with enhanced value. Both plant breeding and molecular technologies have been used to produce plants carrying the desired traits. Genomics-based strategies for gene discovery, coupled with high-throughput transformation processes and miniaturized, automated analytical and functionality assays, have accelerated the identification of product candidates. Molecular marker–based breeding strategies have been used to accelerate the process of moving trait genes into high-yielding germplasm for commercialization. These products are being tested for applications in food, feed, and industrial markets.

Altered seed traits that result in grain with modified oil, protein, and carbohydrate content and composition can bring enhanced value to farmers, producers, and consumers. Such traits, referred to as seed quality or output traits, alter the nutritional or functional properties of the harvested plant for use in foods, animal feeds, or industrial products. Developing plants with improved grain quality traits involves overcoming a variety of technical challenges inherent in metabolic engineering programs. Continuing improvements in molecular and genomic technologies are contributing to the acceleration of product development.

Two complementary pathways have been used to identify and develop grain quality traits for corn and soybeans (1). The first is germplasm screening coupled with high-throughput analytical tests for trait identification, followed by the use of DNA-based molecular markers to accelerate the breeding process. An example of the use of these technologies comes from a program to identify soybeans with reduced concentrations of the antinutritional oligosaccharides, stachyose, raffinose, and galactose (2). Soybean lines were identified through screens of germplasm collections and mutagenized seed populations. These lines were combined to yield varieties that had still lower concentrations of these oligosaccharides; recombinant lines could be selected that also had higher concentrations of sucrose, a benefit in food applications. DNA markers were then used to direct a breeding process that moved the desired traits into commercial high-yielding germplasm. The use of the DNA markers accelerated the breeding program by several generations. The resultant soybeans are now being tested in feed, food, and beverage applications.

Similarly, high-oil varieties of corn were developed at the University of Illinois through successive cycles of recurrent selection (3). These lines have improved energy content for animal feeding applications, but poor agronomic characteristics, including disease susceptibility and poor standability, precluded their commercial introduction on broad acreage. Quantitative trait analysis with molecular markers indicated that high oil content in the seed was controlled by more than 12 genes, making yield improvements through conventional breeding difficult (4). To circumvent this problem, the TopCross grain production system was developed and introduced (5). TopCross provides a means of capturing the genetic contribution from an inferior genotype through an elite hybrid corn variety. In this process, a male fertile high-oil corn variety is interplanted at low density with a male sterile elite hybrid variety of corn. The low planting density of the pollinator means that its contribution to grain yields is minimized. Pollen shed by the agronomically challenged male fertile line will fertilize the field of elite male sterile hybrid corn. In this case, the grain produced by the hybrid corn will have an oil content intermediate between that of both parents, with up to a 100% increase in energy content over that delivered by the elite variety alone. This system has been used successfully on hundreds of thousands of acres, under a variety of environmental conditions over a number of growing seasons, and has been vital to the broad introduction and commercial acceptance of the high-oil trait.

The other pathway taken has been the use of molecular biology to isolate, characterize, and modify individual genes, followed by plant transformation and trait analysis. The development of high-lysine corn and soybean for use in improved animal feeds illustrates the challenges that continually interlace metabolic engineering projects. Lysine is an essential amino acid that is limiting in corn- and soy-based animal feeds. Two of the key enzymes in the lysine biosynthetic pathway are aspartokinase (AK) and dihydrodipicolinic acid synthase (DHDPS), which are both feedback inhibited by lysine (6). Falcoet al. (7) isolated bacterial genes encoding lysine-insensitive forms of AK and DHDPS fromEscherichia coli and Corynebacterium, respectively. A deregulated form of the plant DHDPS was created by site-specific mutagenesis (8, 9). Despite assumed knowledge of the lysine biosynthetic pathway, expression of the deregulated enzymes in different tissues and species produced inconsistent results (7). Expression of these genes in tobacco leaves produced high concentrations of free lysine, as reported by others (10), but no accumulation was observed in tobacco seed with the use of either constitutive or seed-specific promoters. It was discovered that the failure to augment lysine concentrations in the seed was due to the presence of an active catabolic pathway for lysine in tobacco seed. In soybean and canola seeds, lysine accumulated sufficiently to more than double the total seed lysine content. However, the lysine catabolic products saccharopine or α amino adipic acid also accumulated in soybean and canola seeds, respectively, although in strikingly different quantities in the two species. In corn (9), expression with an endosperm-specific promoter did not lead to lysine accumulation, whereas expression with an embryo-specific promoter gave high concentrations of lysine, sufficient to raise overall lysine concentrations in seed 50 to 100% with only minor accumulation of catabolic products (see Table 1). Together these results offer several lessons. Microbial genetics provides a useful tool for isolating relevant genes, and microbes provide a rich resource for accessing diverse genes where enzymes with particular characteristics are needed. Bacterial or plant enzymes can be successfully expressed in specific plant tissues to alter regulation of a metabolic pathway. However, different tissues and different species may react to metabolic perturbations uniquely, and amounts of other metabolic products may unexpectedly change.

Table 1

Summary of results obtained from expressing a gene encoding a nonfeedback inhibited form of dapA in different plant species and tissues. The table shows whether a lysine increase was observed and the presence or absence of two major catabolites. The results demonstrate that the lysine catabolic pathway is regulated differentially in the different species and organs tested.

View this table:

The development of soybeans with increased concentrations of oleic acid also provides an instructive example of seed trait modification. [See (11) for a review of fatty acid biosynthesis.] Unsaturated fatty acids are healthier than saturated fatty acids, and the monounsaturated form, oleic acid (18:1), is also more stable in frying and cooking applications than are the polyunsaturated forms, linoleic (18:2) and linolenic (18:3). Chemical hydrogenation is currently used to improve oxidative stability by increasing the concentrations of 18:1 fatty acid, but hydrogenation also raises the concentration of trans fatty acids, which have been linked to higher health risks. Biochemical methods were used to purify and sequence the relevant soluble proteins from soybean, from which probes were created to isolate the genes that affect fatty acid composition. The membrane-bound enzymes in the pathway proved recalcitrant to biochemical purification strategies and were eventually cloned by T-DNA insertional mutagenesis in Arabidopsis thaliana. The cloning of the genes encoding each of the soybean FAD2 (12) desaturases enabled the cosuppression [silencing of an endogenous gene through the introduction of a homologous transgene; see (13)] of the seed-specific Fad2 desaturase, causing seed oleic acid concentrations to increase from 25% in wild-type lines to 85% in the transgenic lines. In contrast, a previously isolated line of canola, which carried a mutation in a non–seed-specific form of Fad2, was shown to have increased concentrations of oleic acid in roots as well as seeds (14). Subsequent testing revealed that this line had an unfavorable, cold-sensitive phenotype. The cosuppressed, seed-specific, soybean high oleic acid transgenic lines were shown to have excellent agronomic properties, demonstrating the importance of precise gene regulation. In related experiments, transgenic soybean lines with either increases or decreases in palmitic acid (15) were created by overexpressing or cosuppressing the soybean thioesterase gene in a seed-specific fashion. This work demonstrates that precise modifications of metabolic steps, when done correctly and when limited to storage tissues to avoid pleiotropic effects, can result in the desired alterations to seed composition.

The isolation of this set of fatty acid–modifying genes required a profound investment in time. In recognition of the limitations that conventional gene cloning imposed, an expressed sequence tag (EST) program was initiated to remove this barrier in the product development process. ESTs are developed by single-pass sequencing of the 5′ end of complementary DNA clones made from selected biological samples. Since its inception, the DuPont EST program has generated a catalog of more than 1 million lanes of plant, microbe, and insect DNA sequences, thereby characterizing gene expression in a diverse set of biological states. This gene discovery process is extremely efficient, and the database now includes ESTs representing genes that code for most metabolic enzymes in the major crop species.

The program has been further extended through the adoption of an EST-based genomics paradigm in which genes are accessed and identified from a variety of plant and nonplant sources that have novel phenotypes. For example, vernolic acid, an epoxide fatty acid, and ricinoleic acid, a hydroxy fatty acid, are oleic acid derivatives used as hardeners in paints and plastics (Fig. 1). Neither vernolic acid nor ricinoleic acid are made in soybean seeds. Complementary DNA libraries were made from Vernonia and castor bean seeds, plants known to produce the desired fatty acids, and were subjected to limited EST sampling. The genes that encoded the fatty acid–modifying enzymes were identified by DNA sequence homology to desaturase genes. After modification of the regulatory sequences, the genes were transferred to soybean, where the desired industrial oils were produced in seeds (16).

Figure 1

Oleic acid can serve as the starting point for further fatty acid modifications with enzymes encoded by genes isolated from plant or nonplant sources. Some related fatty acids and their possible uses are shown.

Presently, about half of the ESTs can be identified through DNA or protein sequence homology comparisons. Functional genomics programs have been established to aid in the identification of the remaining genes; in a number of cases, genes have been identified that have entered trait development projects. These programs include gene expression analyses with microarray and bead-based complementary DNA library comparisons and sequencing procedures, which allow the expression of a gene to be followed by tissue or organ type, by temporal or development stage, or under biotic, abiotic, or chemical stress conditions (17). Gene-tagging approaches (18), in which genetic elements interrupt or ectopically activate genes, are being used for forward and reverse genetic strategies, to identify genes associated with unusual phenotypes and conversely to identify phenotypes associated with specific genes. New protein analysis tools, such as the isolation of differentially expressed proteins by two-dimensional gel electrophoresis followed by identification with mass spectrometry (19), are also being used for gene identification. Finally, genetically anchored physical maps of strategically important crops are being developed for use as scaffolds to localize ESTs to particular segments of a genome to facilitate map-based gene cloning and identification.

These transgenic trait development examples carry with them throughput constrictions associated with the plant transformation process. Transformation of many crop species, and particularly of elite lines of important crops, is difficult. High-throughput soybean transformation, whether achieved by biolistic orAgrobacterium-based methods (20), remains particularly challenging. Frequently coupled with the inefficiency of transformation is the inefficiency of achieving successful cosuppression events. Cosuppression typically occurs in only a limited number of transformants, for example, in soybean in only 10 to 20% of the transgenic events (21). Transgenes designed for overexpression can have their expression levels affected by the site at which they are inserted in the genome. Thus, in both cases, many transgenic lines must be identified initially so that lines that couple the desired phenotypic change in seed quality and acceptable agronomic performance can be isolated in subsequent field tests. Improvements to transformation frequencies for elite corn and soybean lines have been continual but incremental, so that it has been necessary to build high-throughput production laboratories to generate the numbers of transformation events required to meet product development needs.

The necessity of generating large numbers of transformation events can be partially reduced by improving the control of transgene expression. EST and microarray technologies are being used to identify promoters that meet specific quantitative, temporal, tissue-, or cell-specific gene expression requirements for a particular trait. Synthetic promoters composed of multiple cis-acting elements, combined with novel transcription factors, are being designed to increase gene expression levels. SARs and MARs, scaffold and matrix attachment regions, are being used to stabilize and enhance gene expression (22) and to reduce effects ascribed to the insertion site of a transgene (23). Specific transcription factors that regulate multiple genes in a particular metabolic pathway are being tested for pathway control and have the advantage of requiring the introduction of only a single transgene. The cre-lox site-specific recombination system has been used as a switch to control expression of a gene by excision of an intervening genetic tag (24). Finally, chemically regulated promoter systems have been used to regulate gene expression (25).

Another challenge for a robust product development process is that of functionality assessment. The earlier the properties of a transgene can be measured, the better. The soybean somatic embryo model system is an example of a preferred early screen (Fig. 2). Biochemical assays can be performed on transgenic somatic embryos generated during the soybean regeneration process, before plant development and seed maturation (26). Such assays have been used to identify gene constructs that effect desired changes in oil and protein content and composition. Early tests for biochemical parameters and functional properties on whole plants are also necessary and require high-throughput screens. These assays serve as small-scale surrogate tests for more difficult biological measurements, such as oxidative stability for food applications or animal bioavailability for feed applications.

Figure 2

The use of soybean somatic embryos as a more rapid tool for transgene testing. Several gene constructs can be tested with such embryos, and only those that produce a desirable phenotype are then used in the longer experiments necessary to produce whole plants and seeds. Soybean lines have been selected that produce large numbers of somatic embryos for use in the model system. Although the resulting somatic embryos cannot be regenerated to whole plants, they are ideal for testing multiple constructs. Lines used for regeneration of plants can also be tested at the somatic embryo stage of regeneration but produce smaller numbers of embryos and so are more suitable for production experiments with previously tested constructs.

For many traits, functional attributes can only be assessed after the grain has been broken down into its component parts. Small-scale grain-processing facilities suitable for gram quantities of grain are necessary to isolate grain fractions similar to those produced in commercial-scale milling and processing plants. These small-scale facilities avoid the need for a time-consuming seed multiplication step that is necessary for large-scale assays. Tiered functional assays, in which high-throughput biochemical measurements are used to identify lines that subsequently enter low-throughput functional, nutritional, and sensory evaluations, also increase the efficiency of functionality assessments. An example of the utility of tiered compositional and functional assays comes from transgenic soybean lines with altered seed storage proteins. In soybean flour, two classes of seed storage proteins, 7S and 11S globulins, predominate, each with useful functional characteristics. Transgenic lines in which one or the other of these classes has been eliminated through cosuppression technology (27) were subjected to single seed–based protein polyacrylamide gel electrophoresis. Lines with altered 7S/11S ratios were then tested at increasing scale for emulsification, water and fat binding properties, gel-forming ability, and other parameters relevant to processing, cooking, and food manufacture. From these assessments, specific lines have been selected for seed increases for testing in milk and meat replacement products.

The availability of multiple functionality tests that measure different end-use attributes increases the likelihood that value-added traits will be identified and used in multiple products. For example, when high oleic acid soybean oil is used as a food, the oil has heat and shelf life stability attributes that can be demonstrated in active oxygen-oxidative stability or frying tests (28). When these same soybeans are processed into flour, the flour has been shown to have improved sensory and shelf life properties due to the increased stability of the residual oil. Further, this same high oleic acid oil has been shown to have value as an industrial lubricant through metal wear tests with a small-scale hydraulic pump (29). Thus, a single soybean line was shown to have three distinct product applications because functionality tests for each application were available.

Soy or corn grain with a novel composition must be kept apart, or “identity preserved,” from commodity grain throughout the production and distribution chains to retain its added value. Combining, or stacking, traits can add sufficient value to the final product to compensate for the cost of identity preservation, but the stacking process presents business and breeding challenges. Trait combinations must be chosen that address similar markets or, for ingredients, present similar acreage demands. Transgenes must be bred into the highest yielding germplasm, so that grain production does not incur yield penalties; the continuing improvement in overall production yield of elite lines makes this a challenging breeding target. The use of DNA markers has increased the efficiency of the breeding process. DNA-based molecular marker assays, including those for RFLP, RAPD, SSR, and SNP polymorphisms, have provided a method to accelerate germplasm analysis and trait mapping and a method to move traits rapidly into germplasm for different maturity or production zones (30). The use of DNA markers early in the trait introgression cycle has eliminated 1 to 2 years from breeding timelines. To accomplish this, laboratory facilities are needed where several million DNA-based diagnostic assays can be performed annually.

Another breeding challenge, specific to hybrid corn production, comes from the need to breed recessive or semidominant traits into both inbred parent lines for each production area. Dominant traits are preferred, in that they can be bred into only a single parent or into TopCross pollinators as described above. The TopCross system provides several advantages for transgene introductions in corn. Transgenic pollinator lines do not have to undergo multiple breeding cycles for yield parity, because the pollinator will be planted at a low seeding density and will have only a minimal effect on grain yields. Further, single pollinators can often be used in multiple F1 hybrid combinations, thereby reducing the breeding requirements for transgene introductions. The TopCross system also provides a way to produce traits that have a negative effect on seed agronomic performance, such as on germination, but are desirable in the grain, as the trait can be introduced solely through the pollinator.

To date, a number of output traits have been developed at DuPont that are in various stages of functional testing or commercialization. These include corn and soybean lines for animal feed applications with increased concentrations of free lysine (7) or bound methionine (31). Soybean seeds with altered ratios of seed storage proteins (26) or reduced concentrations of trypsin inhibitors (32) have been produced, primarily for food applications. As described earlier, soybean lines with novel fatty acid compositions have been developed. High monounsaturate oils are useful in foods or food-processing applications where increased oxidative and thermal stability is desirable, and high saturate oils are useful as feedstocks for margarine manufacturing. All of these oils also have nutritional or health benefits. The soybean lines with reduced oligosaccharide concentrations described above and similar corn lines contain reduced concentrations of phytic acid and increased concentrations of free phosphorous. This combination provides nutritional value in animal feed and environmental value because of reduced bound phosphorous released in animal waste. Finally, a number of starches have been created for use in food and industrial applications, including polymers with altered branching patterns, altered branch lengths, and novel monomer compositions (33).

Plants are capable of a stunning array of biochemical conversions. As an understanding of the regulation of these pathways is combined with the ability to engineer multiple genes simultaneously, the possibilities for producing novel products in plants will increase dramatically. Beyond the traditional agricultural products, opportunities exist for producing industrial feedstocks and polymers in crops and for producing pharmaceutical and nutraceutical products. Traits that have been initially produced in corn or soybeans may enter new markets through production in cereals such as wheat and rice or in plantation crops such as forest trees. The continuing integration of structural and functional genomics with trait development technologies will accelerate these advances.


Stay Connected to Science

Navigate This Article