Phylogenomics resolves the timing and pattern of insect evolution

See allHide authors and affiliations

Science  07 Nov 2014:
Vol. 346, Issue 6210, pp. 763-767
DOI: 10.1126/science.1257570

Toward an insect evolution resolution

Insects are the most diverse group of animals, with the largest number of species. However, many of the evolutionary relationships between insect species have been controversial and difficult to resolve. Misof et al. performed a phylogenomic analysis of protein-coding genes from all major insect orders and close relatives, resolving the placement of taxa. The authors used this resolved phylogenetic tree together with fossil analysis to date the origin of insects to ~479 million years ago and to resolve long-controversial subjects in insect phylogeny.

Science, this issue p. 763


Insects are the most speciose group of animals, but the phylogenetic relationships of many major lineages remain unresolved. We inferred the phylogeny of insects from 1478 protein-coding genes. Phylogenomic analyses of nucleotide and amino acid sequences, with site-specific nucleotide or domain-specific amino acid substitution models, produced statistically robust and congruent results resolving previously controversial phylogenetic relations hips. We dated the origin of insects to the Early Ordovician [~479 million years ago (Ma)], of insect flight to the Early Devonian (~406 Ma), of major extant lineages to the Mississippian (~345 Ma), and the major diversification of holometabolous insects to the Early Cretaceous. Our phylogenomic study provides a comprehensive reliable scaffold for future comparative analyses of evolutionary innovations among insects.

Insects (1) were among the first animals to colonize and exploit terrestrial and freshwater ecosystems. They have shaped Earth’s biota, exhibiting coevolved relationships with many groups, from flowering plants to humans. They were the first to master flight and establish social societies. However, many aspects of insect evolution are still poorly understood (2). The oldest known fossil insects are from the Early Devonian [~412 million years ago (Ma)], which has led to the hypothesis that insects originated in the Late Silurian with the earliest terrestrial ecosystems (3). Molecular data, however, point to a Cambrian or at least Early Ordovician origin (4), which implies that early diversification of insects occurred in marine or coastal environments. Because of the absence of insect fossils from the Cambrian to the Silurian, these conclusions remain highly controversial. Furthermore, the phylogenetic relations hips among major clades of polyneopteran insect orders—including grasshoppers and crickets (Orthoptera), cockroaches (Blattodea), and termites (Isoptera)—have remained elusive, as has the phylogenetic position of the enigmatic Zoraptera. Even the closest extant relatives of Holometabola (e.g., beetles, moths and butterflies, flies, sawflies, wasps, ants, and bees) are unknown. Thus, in order to understand the origins of physiological and morphological innovations in insects (e.g., wings and metamorphosis), it is important to reliably reconstruct the tempo and mode of insect diversification. We therefore conducted a phylogenomic study on 1478 single-copy nuclear genes obtained from genomes and transcriptomes representing key taxa from all extant insect orders and other arthropods (144 taxa) and estimated divergence dates with a validated set of 37 fossils (5).

Phylogenomic analyses of transcriptome and genome sequence data (6) can be compromised by sparsely populated data matrices, gene paralogy, sequence misalignment, and deviations from the underlying assumptions of applied evolutionary models, which may result in biased statistical confidence in phylogenetic relations hips and temporal inferences. We addressed these obstacles by removing confounding factors in our analysis (5) (fig. S2).

We sequenced more than 2.5 gigabases (Gb) of cDNA from each of 103 insect species, which represented all extant insect orders (5). Additionally, we included published transcript sequence data that met our standards (table S2) and official gene sets of 14 arthropods with sequenced draft genomes (5), of which 12 served as references during orthology prediction of transcripts (tables S2 and S4). Comparative analysis of the reference species' official gene sets identified 1478 single-copy nuclear genes present in all these species (tables S3 and S4). Functional annotation of these genes revealed that many serve basic cellular functions (tables S14 and S15 and figs. S4 to S6). A graph-based approach using the best reciprocal genome- and transcriptome-wide hit criterion identified, on average, 98% of these genes in the 103 de novo sequenced transcriptomes, but only 79% and 62% in the previously published transcriptomes of in- and out-group taxa, respectively (tables S12 and S13).

After transcripts had been assigned and aligned to the 1478 single-copy nuclear-encoded genes, we checked for highly divergent, putatively misaligned transcripts. Of the 196,027 aligned transcripts, 2033 (1%) were classified as highly divergent. Of these, 716 were satisfactorily realigned with an automated refinement. However, alignments of 1317 transcripts could not be improved, and these transcripts were excluded from our analyses (supplementary data file S5,

Nonrandom distribution of missing data among taxa can inflate statistical support for incorrect tree topologies (7). Because we detected a nonrandom distribution of missing data, we only considered data blocks if they contained information from at least one representative of each of the 39 predefined taxonomic groups of undisputed monophyly (table S6). In this representative data set, the extent of missing data was still between 5 and 97.7% in pairwise sequence comparisons, with high percentages primarily because of the data scarcity in some previously published out-group taxa (table S19 and figs. S7 to S10).

We inferred maximum-likelihood phylogenetic trees (Fig. 1) with both nucleotide (second-codon positions only and applying a site-specific rate model) and amino acid–sequence data (applying a protein domain–based partitioning scheme to improve the biological realism of the applied evolutionary models) from the representative data set (5) (figs. S21, S22, and S23, A and D). Trees from both data sets were fully congruent. The absence of taxa that cannot be robustly placed on the tree (rogue taxa) in the amino acid–sequence data set and the presence of a few rogue taxa that did not bias tree inference in the nucleotide sequence data set (5) indicated a sufficiently representative taxonomic sampling.

Fig. 1 Dated phylogenetic tree of insect relationships.

The tree was inferred through a maximum-likelihood analysis of 413,459 amino acid sites divided into 479 metapartitions. Branch lengths were optimized and node ages estimated from 1,050,000 trees sampled from trees separately generated for 105 partitions that included all taxa (5). All nodes up to orders are labeled with numbers (gray circles). Colored circles indicate bootstrap support (5) (left key). The time line at the bottom of the tree relates the geological origin of insect clades to major geological and biological events. CONDYLO, Condylognatha; PAL, Palaeoptera.

To detect confounding signal derived from nonrandom data coverage, we randomized amino acids within taxa, while preserving the distribution of data coverage in the representative data set (5). This approach revealed no evidence of biased node support that could be attributed to nonrandom data coverage (5) (figs. S11 and S12 and table S20). Phylogenomic data may violate the assumption of time-reversible evolutionary processes, irrespective of what partition scheme one applies, which could lead to incorrect tree estimates and biased node support. Because sections in the amino acid–sequence alignments of the representative data set violating these assumptions were present, we tested whether the observed compositional heterogeneity across taxa biased node support but found no evidence for this (5) (fig. S20). We next discarded data strongly violating the assumption of time-reversible evolutionary processes (tables S21 and S22, data files S6 to S8, and figs. S13 to S19). Results from phylogenetic analysis of this filtered data set (5) were fully congruent with those obtained from analyzing the unfiltered representative data set. The nucleotide sequence data of the representative data set containing also first and third codon positions strongly violated the assumption of time-reversible evolutionary processes, but still supported largely congruent topologies (fig. S23, B to D). In summary, our phylogenetic inferences are unlikely to be biased by any of the above-mentioned confounding factors.

Our phylogenomic study suggests an Early Ordovician origin of insects (Hexapoda) at ~479 Ma [confidence interval (CI), 509 to 452 Ma] and a radiation of ectognathous insects in the Early Silurian ~441 Ma (CI 465 to 421 Ma) (Figs. 1 and 2). These estimates imply that insects colonized land at roughly the same time as plants (8), in agreement with divergence date estimates on the basis of other molecular data (4).

Fig. 2 Sorted ordinal and interordinal node age estimates.

For each labeled node (numbers on the left and right of the figure correspond to the node labels in the tree of Fig. 1), the median (red bar), and the range of the upper and lower confidence interval (black rectangle) of age estimates are illustrated. These medians and upper and lower confidence intervals are derived from uniformly sampled trees over all 105 metapartitions (5). Additionally, we present medians of age estimates separately derived from each metapartition. Within the bean plot (gray scale), blue bars indicate the distribution of median age estimates, large blue bars indicate the inferred median of medians. All node age estimates refer to the estimated common origin of included species. Stem-lineage representatives can, of course, be older. The maximum root age of the tree was set to 580 Ma to coincide with the oldest Ediacaran fossils (5).

The early diversification pattern of insects has remained unclear (2, 7, 9). We received support for a monophyly of insects, including Collembola and Protura as closest relatives (10), and Diplura as closest extant relatives of bristletails (Archaeognatha), silverfish (Zygentoma), and winged insects (Pterygota) (Fig. 1). Furthermore, our analyses corroborate Remipedia, cave-dwelling crustaceans, as the closest extant relatives of insects (11, 12).

A close phylogenetic relationship of bristletails to a clade uniting silverfish and winged insects (Dicondylia) is generally accepted. However, the monophyly of silverfish has been questioned, with the relict Tricholepidion gertschi considered more distantly related to winged insects than other silverfish (13). We find that silverfish are monophyletic, consistent with recently published morphological studies (14), and estimate that Tricholepidion diverged from other silverfish in the Late Triassic (~214 Ma) (Figs. 1 and 2). This result implies parallel and independent loss of the ligamentous head endoskeleton, abdominal styli, and coxal vesicles in winged insects and silverfish (5).

The diversification of insects is undoubtedly related to the evolution of flight. Fossil winged insects exist from the Late Mississippian (~324 Ma) (15), which implies a pre-Carboniferous origin of insect flight. The description of †Rhyniognatha (~412 Ma) from a mandible, potentially indicative of a winged insect, suggested an Early Devonian to Late Silurian origin of winged insects (3). Our results corroborate an origin of winged insect lineages during this time period (16) (Figs. 1 and 2), which implies that the ability to fly emerged after the establishment of complex terrestrial ecosystems.

Ephemeroptera and Odonata are, according to our analyses, derived from a common ancestor. However, node support is low for Palaeoptera (Ephemeroptera + Odonata) and for a sister group relationship of Palaeoptera to modern winged insects (Neoptera), which indicates that additional evidence, including extensive taxon sampling and the analysis of genomic meta-characters (17), will be necessary to corroborate these relationships.

We find strong support for the monophyly of Polyneoptera, a group that comprises earwigs, stoneflies, grasshoppers, crickets, katydids (Orthoptera), Embioptera, Phasmatodea, Mantophasmatodea, Grylloblattodea, cockroaches, mantids, termites, and Zoraptera (1820). We estimated the origin of the polyneopteran lineages at ~302 Ma (CI 377 to 231 Ma) in the Pennsylvanian (Figs. 1 and 2), consistent with the idea that at least part of the rich Carboniferous neopteran insect fauna was of polyneopteran origin. Finally, our analyses suggest that the major diversity within living cockroaches, mantids, termites, and stick insects evolved after the Permian mass extinction.

Given that the oldest known fossil hemipterans date to the Middle Pennsylvanian (~310 Ma) (21), it had been thought that the stylet marks on liverworts from the Late Devonian (~380 Ma) (22) could not have been of hemipteran origin. Our study indicates that true bugs (Hemiptera) and their sister lineage, thrips (Thysanoptera), all of which possess piercing-sucking mouthparts, originated ~373 Ma (CI 401 to 346 Ma), which gives support to the possibility of a hemipteroid origin of Early Paleozoic stylet marks.

True bugs, thrips, bark lice (Psocoptera), and true lice (Phthiraptera) (together called Acercaria) were thought to be the closest extant relatives of Holometabola (Acercaria + Holometabola = Eumetabola) (10). However, convincing morphological features and fossil intermediates supporting a monophyly of Acercaria are lacking (13). We recovered bark and true lice (Psocodea) as likely closest extant relatives of Holometabola (5), which suggests that both groups started to diverge in the Devonian-Mississippian ~362 Ma (CI 390 to 334 Ma) (Figs. 1 and 2). However, this result did not receive support in all statistical tests and, therefore, should be further investigated in future studies that embrace additional types of characters (17).

We estimated that the radiation of parasitic lice occurred ~53 Ma (CI 67 to 46 Ma), which implies that they diversified well after the emergence of their avian and mammalian hosts in the Late Cretaceous–Early Eocene and contradicts the hypothesis that parasitic lice originated on feathered theropod dinosaurs ~130 Ma (23).

Within Holometabola, our study recovered phylogenetic relations hips fully congruent with those suggested in recent studies (2, 24, 25). Although we estimated the origin of stem lineages of many holometabolous insect orders in the Late Carboniferous, we dated the spectacular diversifications within Hymenoptera, Diptera, and Lepidoptera to the Early Cretaceous, contemporary with the radiation of flowering plants (21, 26). The almost linear increase in interordinal insect diversity suggests that the process of diversification of extant insects may not have been severely affected by the Permian and Cretaceous biodiversity crises (Fig. 2).

With this study, we have provided a robust phylogenetic backbone tree and reliable time estimates of insect evolution. These data and analyses establish a framework for future comparative analyses on insects, their genomes, and their morphology.

Supplementary Materials

Materials and Methods

Supplementary Text

Figs. S1 to S27

Tables S1 to S26

Data File Captions S1 to S14

References (27187)

  • * Major contributors.

References and Notes

  1. The term “insects” is used here in a broad sense and synonymous to Hexapoda (including the ancestrally wingless Protura, Collembola, and Diplura).
  2. Materials and methods are available as supplementary material on Science Online.
  3. Transcriptome refers to the sequencing of all of the mRNAs of an individual or many individuals present at the time of preservation.
  4. With the notable exception of Cephalocarida, for which RNA-sequencing data were unavailable to us.
  5. These estimates are robust whether or not †Rhyniognatha (Pragian stage, ~ 412 Ma) is used as a calibration point.
  6. These 2673 data blocks had been assigned to 220 clans (8.2% of the data blocks), to 695 Pfam-A domains (26.0%) not belonging to a clan, and to 439 Pfam-B domains (16.4%). Of the total, 1318 data blocks (49.3%) remained without annotation (voids), and data blocks smaller than 21 amino acid sites were merged into one data block.
  7. Acknowledgments: The data reported in this paper are tabulated in the supplementary materials and archived at National Center for Biotechnology Information, NIH, under the Umbrella BioProject ID PRJNA183205 (“The 1KITE project: evolution of insects”). Supplementary files are archived at the Dryad Digital Repository Funding support: China National GeneBank and BGI-Shenzhen, China; German Research Foundation (NI 1387/1-1; MI 649/6, MI 649/10, RE 345/1-2, BE1789/8-1, BE 1789/10-1, STA 860/4, Heisenberg grant WA 1496/8-1); Austria Science Fund FWF; NSF (DEB 0816865); Ministry of Education, Culture, Sports, Science and Technology of Japan Grant-in-Aid for Young Scientists (B 22770090); Japan Society for the Promotion of Science (P14071); Deutsches Elektronen-Synchrotron (I-20120065); Paul Scherrer Institute (20110069); Schlinger Endowment to CSIRO Ecosystem Sciences; Heidelberg Institute for Theoretical Studies; University of Memphis-FedEx Institute of Technology; and Rutgers University. The authors declare no conflicts of interest.
View Abstract

Navigate This Article