Special Reviews

Arabidopsis thaliana: A Model Plant for Genome Analysis

See allHide authors and affiliations

Science  23 Oct 1998:
Vol. 282, Issue 5389, pp. 662-682
DOI: 10.1126/science.282.5389.662


Arabidopsis thaliana is a small plant in the mustard family that has become the model system of choice for research in plant biology. Significant advances in understanding plant growth and development have been made by focusing on the molecular genetics of this simple angiosperm. The 120-megabase genome ofArabidopsis is organized into five chromosomes and contains an estimated 20,000 genes. More than 30 megabases of annotated genomic sequence has already been deposited in GenBank by a consortium of laboratories in Europe, Japan, and the United States. The entire genome is scheduled to be sequenced by the end of the year 2000. Reaching this milestone should enhance the value of Arabidopsis as a model for plant biology and the analysis of complex organisms in general.

Arabidopsis thaliana has recently become the organism of choice for a wide range of studies in plant sciences (1). The current visibility ofArabidopsis research reflects the growing realization among biologists that this simple angiosperm can serve as a convenient model not only for plant biology but also for addressing fundamental questions of biological structure and function common to all eukaryotes. While genome projects have documented the extent to which all eukaryotic organisms share a common genetic ancestry, research withArabidopsis has clarified the important role that analysis of plant genomes can play in understanding basic principles of biology relevant to a variety of species, including humans. The emergence of a large, multinational research community devoted to the complete analysis of a single plant represents a dramatic paradigm shift for plant biology. Traditionally, advances in our understanding of plant structure and function were built on research with a wide range of species, particularly those relevant to agriculture. Although an impressive amount of information was collected with this approach, advances in many disciplines were limited by scattered community resources, duplication of effort, and limited funding. Several plants were recognized as model genetic systems, including maize, tomato, pea, rice, barley, petunia, and snapdragon, but research biologists failed to reach a consensus on which species was most suitable for studying processes common to all plants. As a result, our understanding of fundamental aspects of plant growth and development such as flowering, root growth, hormone action, and responses to environmental signals remained limited.

Twenty years ago, plant biologists began to search for another model organism suitable for detailed analysis using the combined tools of genetics and molecular biology. Plants with effective protocols for regeneration in culture (such as petunia and tomato) were logical candidates, particularly for studies involvingAgrobacterium-mediated cell transformation, but attention gradually shifted toward Arabidopsis, a small weed in the mustard family that was first chosen as a model genetic organism by Laibach in Europe and later studied in detail by Rédei in the United States (2). The shift toward Arabidopsisgained momentum in the early 1980s with the release of a detailed genetic map (3) and publications outlining the value ofArabidopsis for research in plant physiology, biochemistry, and development (4). This was followed by two significant advances, the establishment of transformation protocols (5) and the demonstration that Arabidopsis had a small genome amenable to detailed molecular analysis (6).

The modern era of Arabidopsis research began in 1987 with the opening of the Third International ArabidopsisConference at Michigan State University and the subsequent formation of an electronic Arabidopsis newsgroup. Many individuals experienced in the analysis of other model organisms soon began to study Arabidopsis as a promising model for basic research. One important outgrowth of this increased enthusiasm forArabidopsis research was the drafting in 1990 of a vision statement outlining long-term research goals for theArabidopsis community. These included saturating the genome with mutations, identifying every essential gene, and sequencing the entire genome by the end of the decade. The importance of applying advances with Arabidopsis to other plants and to solving practical problems in agriculture, industry, and human health was also stressed. A further commitment to Arabidopsis research was made in 1996 with the establishment of the ArabidopsisGenome Initiative dedicated to coordinating large-scale sequencing efforts. This initiative has become a model for multinational cooperation and has already resulted in more than 30 Mb of genomic DNA sequence being deposited in public databases. The remainder of the 120-Mb genome is scheduled to be sequenced by the end of 2000.Arabidopsis has therefore progressed in 20 years from an obscure weed to a respected member of the “Security Council of Model Genetic Organisms” (7). Here we review some recent advances in Arabidopsis research and summarize features that have made this simple angiosperm a model for research in plant biology.

Biology of Arabidopsis

Arabidopsis thaliana (Fig. 1) is a member of the mustard family (Cruciferae or Brassicaceae) with a broad natural distribution throughout Europe, Asia, and North America [see (1) for detailed reviews]. Many different ecotypes (accessions) have been collected from natural populations and are available for experimental analysis. The Columbia and Landsberg ecotypes are the accepted standards for genetic and molecular studies. The entire life cycle, including seed germination, formation of a rosette plant, bolting of the main stem, flowering, and maturation of the first seeds, is completed in 6 weeks. When it comes to size, almost everything aboutArabidopsis is small. Flowers are 2 mm long, self-pollinate as the bud opens, and can be crossed by applying pollen to the stigma surface. Seeds are 0.5 mm in length at maturity and are produced in slender fruits known as siliques. Seedlings develop into rosette plants that range from 2 to 10 cm in diameter, depending on growth conditions. Leaves are covered with small unicellular hairs known as trichomes that are convenient models for studying morphogenesis and cellular differentiation.

Figure 1

Arabidopsis thaliana at an early stage of flowering. [Drawing by K. Sutliff]

Plants can be grown in petri plates or maintained in pots located either in a greenhouse or under fluorescent lights in the laboratory. Bolting starts about 3 weeks after planting, and the resulting inflorescence forms a linear progression of flowers and siliques for several weeks before the onset of senescence. Flowers are composed of an outer whorl of four green sepals and inner whorls containing four white petals, six stamens bearing pollen, and a central gynoecium that forms the silique. Mature plants reach 15 to 20 cm in height and often produce several hundred siliques with more than 5000 total seeds. The roots are simple in structure, easy to study in culture, and do not establish symbiotic relationships with nitrogen-fixing bacteria. Natural pathogens include a variety of insects, bacteria, fungi, and viruses.

Genetic Analysis

The Arabidopsis research community has developed most of the methods and resource materials expected of a model genetic organism. These include simple procedures for chemical and insertional mutagenesis, efficient methods for performing crosses and introducing DNA through plant transformation, extensive collections of mutants with diverse phenotypes, and a variety of chromosome maps of mutant genes and molecular markers (8). The absence of an efficient system for gene replacement through homologous recombination is a limitation shared by other model organisms such asDrosophila and Caenorhabditis elegans. Promising advances in this important area of Arabidopsis research have nevertheless been reported (9). Mature seeds are the preferred targets for chemical mutagenesis because millions of progeny seeds homozygous for recessive mutations can be produced by selfing M1 plants derived from a single experiment. Insertional mutagenesis with transferred DNA (T-DNA) from Agrobacterium tumefaciens has become routine through development of whole-plant transformation methods (10) that avoid the pitfalls associated with plant regeneration in culture. Thousands of transgenic lines carrying random T-DNA insertions throughout the genome have been deposited in public stock centers. Many additional lines are being produced at private companies interested in functional genomics. Maize transposable elements introduced throughAgrobacterium-mediated transformation have also been used extensively for gene disruption (11).

Several thousand mutants of Arabidopsis defective in almost every aspect of plant growth and development have been identified over the past 20 years. The ability to save genetic stocks as seeds has minimized the effort required to maintain these mutants over long periods of time. Mutations that interfere with gametogenesis, seed formation, leaf and root development, flowering, senescence, metabolic and signal transduction pathways, responses to hormones, pathogens, and environmental signals, and many cellular and physiological processes have been identified (1). Because mapping and allelism tests have often lagged behind mutant identification, a number of mutants currently being studied in different laboratories are likely to be defective in the same gene. Progress has nevertheless been made toward establishing community standards for gene nomenclature and mutant analysis to minimize duplication of effort (12).

The Arabidopsis genome is organized into five chromosomes and contains an estimated 20,000 genes. The small size of meiotic chromosomes and the absence of polytene chromosomes have limited cytogenetic studies of chromosome structure, although visualization has improved in recent years with in situ hybridization methods (13). Three related maps of each chromosome (classical genetic, recombinant inbred, and physical) are presented on the wall chart included with this genome issue. The classical map shows estimated locations of mutant genes based on recombination frequencies. The original map was produced by analyzing segregating phenotypes in the F2 generation after self-pollination of F1plants. More than 460 mutant genes are included on the current map, which is available through the Internet athttp://mutant.lse.okstate.edu. The precise order and distances between many linked genes remain to be determined because map locations are based largely on two-point recombination data. One striking feature of the classical map is the large number of cloned mutant genes included (more than 110 at present). These genes are noted in orange (mapped relative to phenotypic markers) and green (mapped relative to molecular markers) on the attached chart. The recombinant inbred (RI) map illustrates locations of cloned genes and molecular markers based on recombination within a defined mapping population produced through repeated selfing of progeny plants in successive generations (14). Markers on this map include restriction fragment length polymorphisms (RFLPs), simple sequence length polymorphisms (SSLPs), cleaved amplified polymorphic sequences (CAPSs), and a variety of cloned genes, expressed sequence tags (ESTs), and the ends of bacterial (BAC) and yeast (YAC) artificial chromosomes. More than 790 markers are included on the current RI map, which can be viewed athttp://nasc.nott.ac.uk/new_ri_map.html. The length of each RI chromosome has been adjusted on the chart to match that of the classical chromosome. This facilitates comparison between equivalent regions and emphasizes the fact that genetic distances between molecular markers on the RI map will eventually become secondary to physical distances measured in base pairs. Mutant genes noted in green and purple on the classical map were first assigned a chromosome position based on recombination frequencies with molecular markers located on the RI map. Updated information on physical maps of the fiveArabidopsis chromosomes can be found athttp://genome-www.stanford.edu/Arabidopsis/.

In addition to mutagenesis and mapping efforts, genetic analysis ofArabidopsis has expanded in recent years to include specialized topics of broad interest such as epigenetics, gene silencing, tetrad analysis, centromere mapping, and reverse genetics. The history of maize genetics is filled with elegant studies of epigenetics and paramutation. Research with Arabidopsis has offered molecular details on some of the genes involved within a functional genomics context (15). Tetrad analysis became possible in Arabidopsis with the isolation of thequartet mutant in which four pollen grains derived from a single meiotic event remain attached when released from the anther but nevertheless participate in fertilization (16). The precise number of insertional mutants available inArabidopsis is difficult to determine because some collections are available through public stock centers whereas others are being produced in the private sector. However, plans are under way to improve community access to insertional mutants and to make it possible to obtain a knockout of virtually any gene of interest with only minimal effort (17). Thus, with continued advances in mutant analysis, genome sequencing, and production of knockouts,Arabidopsis may soon become the higher eukaryote of choice for studying many fundamental concepts of modern genetics.

Research Community

The Arabidopsis community is a diverse group of scientists representing more than 30 different countries. Almost every major university, research institute, and private company active in plant research has at least one individual working onArabidopsis. This wide involvement, reflected in increased attendance at annual Arabidopsis meetings, attracted more than 900 participants to the summer 1998 meeting held in Madison, Wisconsin. Community resources include a centralized database, two stock centers, established EST projects, and several large-scale sequencing laboratories associated with the ArabidopsisGenome Initiative (AGI) (Table 1). Rapid communication on scientific matters is facilitated through broad participation in the electronic Arabidopsis newsgroup (18). For the past 6 years, annual progress toward goals set forth in the Multinational Coordinated Arabidopsis thalianaGenome Research Project has been summarized in a document published by the U.S. National Science Foundation (NSF) (19).

Table 1

Community resources for Arabidopsis genome analysis.

View this table:

Community decisions are coordinated by two representative groups: the multinational science steering committee and the North American steering committee. Advanced courses and workshops such as those offered by Cold Spring Harbor Laboratory and the European Molecular Biology Organization have played an important role in training an entire generation of Arabidopsis biologists. The contributions of Asian scientists to Arabidopsis research have become increasingly apparent at recent meetings, particularly the Fifth International Congress of Plant Molecular Biology held in Singapore. Funding agencies have also played a significant role in supporting and promoting Arabidopsis research. Significant investments in basic plant research have been made throughout Europe, Japan, Australia, and the United States, where the NSF continues to play a leadership role in funding sequencing efforts and a wide range of individual investigator awards.

Genome Sequencing Initiative

The AGI was established in 1996 to facilitate coordinated sequencing of the Arabidopsis genome (20). This initiative followed advances in EST sequencing projects (21), construction of standardized YAC and BAC libraries (22), establishment of physical maps for limited regions of the genome (23), and molecular analysis of many individual genes. AGI participants from Europe, Japan, and the United States agreed on a strategy that combined BAC end sequencing, fingerprinting, hybridization with anchored YACs and molecular markers, and starting points spread across the genome to begin sequencing contiguous clusters of BACs with minimal overlaps. The Japanese group proceeded with sequencing P1 artificial chromosome (PAC) clones because they had already invested in this approach. Each group was assigned a chromosomal region to begin sequencing with the understanding that assignments could be adjusted later to reflect progress and availability of funding. Updated information on sequencing efforts can be obtained from the Internet addresses listed in Table 1.

By 1 July 1998, the total amount of random BAC end sequence generated by the TIGR, SPP, and Genoscope groups was 13.6 Mb from 18,746 clones. By this same date, the entire AGI consortium had deposited in GenBank another 28 Mb of annotated genomic sequence from defined chromosomal regions. This included 4 to 5 Mb each from chromosomes 1 and 2, 9 to 10 Mb each from chromosomes 4 and 5, and less than 0.5 Mb from chromosome 3. Analysis of a contiguous 1.9-Mb region of chromosome 4 was recently published by the ESSA group (24) and several sequenced regions on chromosome 5 have been published by the Kazusa group in Japan (25). In addition, the CSHL consortium has made available extensive fingerprinting data and initial results of organizing BAC clones into a genome-wide contig map. Approximately 70 Mb of the genome was contained in 66 BAC contigs by 1 July, and plans were under way to complete the analysis of all 22,000 BAC clones by the end of the year. The combined availability of BAC end sequences and a genome-wide contig map should have an immediate impact on Arabidopsis research, particularly in the widespread use of chromosome walking to clone genes identified by mutation.

Representatives from AGI sequencing laboratories met again later in July to discuss strategies for completing the genome in 2.5 years, several years ahead of the schedule established in 1996. This accelerated timetable was made possible in part by additional funding from the European Commission and from the NSF Plant Genome Research Program. Participants agreed that completion of a given chromosome would be defined as the full sequence of each arm as a single contig from subtelomeric repeat to centromeric tandem repeats, with acceptable gaps defined as internal tandem repeat regions (including ribosomal DNA) of known length. It became apparent during the meeting that experience gained from completing the 100-Mb genome of C. elegans will be helpful in finishing the Arabidopsisproject and that lessons learned with Arabidopsis could be applied to the analysis of more complex genomes in the future. Improved technology such as the automated template procedures developed by the SPP consortium for use with Arabidopsis may also find broad application in future genome projects.

The AGI and EST projects described above have provided a wealth of information on gene identity and genome organization in plants. TheArabidopsis genome is highly enriched for coding sequences, with one gene every 5 kb on average (24). About half of these genes appear to be closely related in sequence to genes found in other organisms ranging from bacteria to humans. In striking contrast to maize, where repetitive DNA rich in transposons constitutes a large percentage of the genome, Arabidopsis has a relatively small amount of interspersed repetitive DNA. Sequencing theArabidopsis genome has therefore proven to be a cost-effective method of identifying every gene in a representative flowering plant.

Examples of Research Advances

Research with Arabidopsis has provided valuable insights into all aspects of modern biology. In some cases, long-standing questions in plant physiology and biochemistry were first resolved through genetic and molecular analysis ofArabidopsis mutants. For example, elucidation of ethylene signal transduction pathways in Arabidopsis provided the first unequivocal identification of a hormone receptor in plants (26). The developmental significance of another class of plant hormones, the brassinosteroids, was revealed by analyzing Arabidopsis mutants defective in brassinosteroid synthesis (27). This work had the additional benefit of providing insights into the biochemistry of related steroids important for human health. In the area of light perception, mutant analysis withArabidopsis has led to the identification of plant receptors and signal transduction components for phototropism (28) and circadian rhythms (29) in addition to advancing our understanding of phytochrome action (30). Several genes that regulate the transition to flowering have been identified (31) and elegant models constructed for the genetic control of pattern formation during floral development (32). Advances in biochemistry and cell biology have covered topics ranging from ion transport and fatty acid biosynthesis to cell wall formation and chloroplast maintenance (1).

Some research with Arabidopsis has provided unexpected insights into cellular mechanisms shared with other organisms. For example, a protein complex initially identified through genetic analysis of the constitutive photomorphogenic class ofArabidopsis mutants has been found throughout eukaryotes and may provide clues to complex signal transduction networks active in humans (33). A retinal photoreceptor that may serve to entrain the circadian clock in mammals was recently identified on the basis of, in part, similarity to the CRY2 photoreceptor ofArabidopsis (34). Plant biologists have long realized that cellular mechanisms common to eukaryotes are often characterized first in yeast or animal systems and then later extended to plants. The advent of Arabidopsis functional genomics and the availability of large numbers of Arabidopsis mutants defective in known gene products provides a unique opportunity for plant biologists to contribute to research efforts in a variety of related disciplines. As a result, it will become increasingly important for those studying other groups of organisms to keep abreast of continuing advances in plant biology.

Many biotechnology companies are counting on Arabidopsisresearch to help solve practical problems related to agriculture, energy, and the environment. Significant advances have already been reported in applied research efforts including molecular cloning of disease resistance genes (35), engineering of plants resistant to cold temperatures (36), production of specialized hydrocarbons (37), and stimulation of premature flowering in trees and other plants with extended life cycles (38). If patent applications are any indication of the practical benefits of Arabidopsis research, then the economic value of this simple weed has already been demonstrated (39). One of the original ideas behind usingArabidopsis as a model system was to facilitate the identification of related genes of importance in crop plants. At the moment there is every indication that this strategy is working as planned.

Vision for the Future

A new vision statement for the future of Arabidopsisresearch was recently articulated in the annual report for the Multinational Arabidopsis Genome Project (19). The short-term goals were to complete the genomic sequence and screens for informative mutations, obtain insertional knockouts of every major class of gene, continue detailed characterization of cellular, physiological, and developmental pathways, continue the widespread use of Arabidopsis as a model to study basic principles of genetics, establish improved computing systems to organize information on cellular processes involved in plant growth and development, and make advances obtained through the Arabidopsis genome project available to those working on other projects. Technological innovations such as the use of DNA chips and microarrays to study global patterns of gene expression (40) should play an important role in Arabidopsis research during this period. The more long-term goals are to determine the functions and locations of key gene products identified through large-scale sequencing efforts, uncover mechanisms by which complex networks of gene products become established and localized, combine information on gene products with advances in plant physiology and biochemistry to establish a comprehensive picture of plant structure and function, and useArabidopsis to resolve questions concerning evolutionary relationships among eukaryotic organisms and the evolution of common cellular and developmental pathways (19).

Meeting these goals will place increasing demands on the development of databases designed to present massive amounts of information toArabidopsis experts and the diverse audience of biologists. Representatives of the Arabidopsis and informatics communities met in the summer of 1998 to discuss options for designing and supporting a new generation of databases for Arabidopsisin particular and plant biology in general. Although a number of problems remain to be addressed, there was agreement that developing innovative methods for providing access to information represents one of the principal long-term challenges of the Arabidopsisgenome project. With continued progress in genomics, biology, and database management, it nevertheless appears likely thatArabidopsis will soon become a model not only for understanding plant structure and function, but also for addressing more universal questions concerning the nature and origin of biological complexity.

  • * To whom correspondence should be addressed. E-mail: cherry{at}genome.stanford.edu


View Abstract

Navigate This Article