Research Article

Recurrent evolution of vertebrate transcription factors by transposase capture

See allHide authors and affiliations

Science  19 Feb 2021:
Vol. 371, Issue 6531, eabc6405
DOI: 10.1126/science.abc6405

You are currently viewing the abstract.

View Full Text

Log in to view the full text

Log in through your institution

Log in through your institution

A recipe for new genes

Most lineages contain evolutionarily novel genes, but their origin is not always clear. Cosby et al. investigated the origin of families of lineage-specific vertebrate genes (see the Perspective by Wacholder and Carvunis). Fusion between transposable elements (TEs) and host gene exons, once incorporated into the host genome, could generate new functional genes. Examination of KARABINER, a bat gene that arose through this process, shows how the retention of part of the TE within this gene allows the transcribed protein to bind throughout the genome and act as a transcriptional regulator. Thus, TEs interacting within their host genome provide the raw material to generate new combinations of functional domains that can be selected upon and incorporated within the hierarchical cellular network.

Science, this issue p. eabc6405; see also p. 779

Structured Abstract


How novel protein architectures evolve remains poorly understood. The rearrangement of domains with preexisting functions into new composite architectures through exon shuffling is a powerful path to form genes encoding proteins with novel functionalities. Although exon shuffling is thought to account for the evolution of many protein structures, the source of new exons and splice sites as well as the mechanisms by which they become assimilated have been scarcely characterized. In this work, we investigate the contribution of DNA transposons to the formation of novel protein-coding genes through exon shuffling during vertebrate evolution.


DNA transposons are widespread mobile elements encoding transposase proteins that promote their selfish replication in host genomes. Transposases typically contain DNA binding and catalytic nuclease domains, which may be repurposed for cellular functions. By inserting functional domains into new genomic contexts, transposase sequences can generate host-transposase fusion (HTF) genes through alternative splicing. Several genes with critical developmental functions, such the Pax transcription factors, are thought to have been born through this process. However, the mechanism by which transposase domains are captured to generate HTFs, how common this process is, and the functions of most known HTF genes remain unclear.


We used comparative genomics to survey all tetrapod genomes with available gene models (596) for putative HTFs. We identified 106 distinct HTFs derived from 94 independent fusion events over the course of ~300 million years of evolution. We found that most HTFs evolved through the alternative splicing of host domains to transposase proteins using splice sites provided by the transposon. The transposase domains of all HTFs analyzed (81) are evolving under purifying selection, which suggests that they have been maintained for organismal function. The domain composition of HTF proteins indicates that most of them consist of transposase DNA binding domains fused to host domains that are predicted to function in transcriptional and/or chromatin regulation, especially the repressive Krüppel-associated box (KRAB) domain (involved in ~30% of all HTFs), which suggests that many HTFs function as transcriptional regulators. Supporting this hypothesis, we show that four independently evolved KRAB-transposase fusion proteins repress gene expression in a sequence-specific manner in reporter assays. Furthermore, loss of function, rescue, and regulatory genomics experiments in bat cells revealed that the bat-specific KRABINER fusion protein binds hundreds of cognate transposons genome-wide and controls a large network of genes and cis-regulatory elements.


Our findings confirm that exon shuffling is a major evolutionary force generating genetic novelty. We provide evidence that DNA transposons promote exon shuffling by inserting transposase domains in new genomic contexts. This process provides a plausible path for the emergence of several ancient transcription factors with important developmental functions. By illustrating how a transcription factor and its dispersed binding sites can emerge simultaneously from a single transposon family, our results bolster the view that transposons are key players in the evolution of gene regulatory networks.

Transposase capture contributes to the evolution of transcription factors by combining DNA transposase and host domains.

(A) Model for how transposase capture occurs. (B) Abundance and characteristics of identified HTFs. (C) Summary of KRABINER’s role as a transcription factor (TF) in bat cells. TE, transposable element; tpase, transposase; DBDs, DNA binding domains; KO, knockout; ChIP-seq, chromatin immunoprecipitation sequencing; PRO-seq, precision run-on sequencing; TRE, transcribed regulatory element.


Genes with novel cellular functions may evolve through exon shuffling, which can assemble novel protein architectures. Here, we show that DNA transposons provide a recurrent supply of materials to assemble protein-coding genes through exon shuffling. We find that transposase domains have been captured—primarily via alternative splicing—to form fusion proteins at least 94 times independently over the course of ~350 million years of tetrapod evolution. We find an excess of transposase DNA binding domains fused to host regulatory domains, especially the Krüppel-associated box (KRAB) domain, and identify four independently evolved KRAB-transposase fusion proteins repressing gene expression in a sequence-specific fashion. The bat-specific KRABINER fusion protein binds its cognate transposons genome-wide and controls a network of genes and cis-regulatory elements. These results illustrate how a transcription factor and its binding sites can emerge.

View Full Text

Stay Connected to Science