De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

See allHide authors and affiliations

Science  23 Mar 2017:
DOI: 10.1126/science.aal3327
  • Fig. 1 Starting with a draft assembly, we use Hi-C data to correct misjoins, scaffold, and merge overlaps, thereby generating an assembly of the Aedes aegypti mosquito genome with chromosome-length scaffolds.

    Here, we show contact matrices generated by aligning a Hi-C dataset to both the AaegL2 assembly (17) that we used as input (left) and the final AaegL4 assembly generated by our algorithm (right). Pixel intensity in the contact matrix indicates how often a pair of loci co-locate in the nucleus. The loci corresponding to each row and column are illustrated using chromograms. On the left, the chromogram depicts the 3 linkage groups (Lnk1, Lnk2, Lnk3, or Unassigned) reported in AaegL2; on the right, it depicts the 3 chromosome-length scaffolds in AaegL4 (chr1, chr2, chr3). To create the chromogram, each AaegL4 arm is assigned a linear color gradient, thereby specifying a color for each AaegL4 locus. The same colors are then used for the corresponding loci in AaegL2 (left) and in the illustration of our procedure (center, though with increased contrast). Chromogram discontinuities indicate differences with AaegL4. In the center, we illustrate our assembly algorithm using an input scaffold from Lnk1 of AaegL2 (‘supercontig 1.12’, see bracket). First, the scaffold is examined for misjoins and split such that the resulting segments each exhibit a continuous Hi-C signal (top row). Next, the segments are used as input for iterative scaffolding. Ultimately, only one of the segments is assigned to chromosome 1 of AaegL4. The rest of supercontig 1.12 is assigned to 2q, in the vicinity of several scaffolds that were not anchored in AaegL2 (middle row). Finally, segments exhibiting a similar 3D signal are examined for evidence of overlapping sequence (green rectangle) and merged (bottom row). The final contact map is consistent with the Rabl configuration, i.e., the spatial clustering of centromeres and telomeres.

  • Fig. 2 Comparison of AaegL4 and CpipJ3 with genetic maps.

    (A) We compared AaegL4 with a genetic map of Ae. aegypti (19). Our assembly agreed with the genetic map on 1822 out of 1826 markers. The exceptions are due to misjoins in AaegL2 that were not corrected in AaegL4. (B) Similarly, CpipJ3 is in agreement with a genetic map of Cx. quinquefasciatus (21).

  • Fig. 3 The content of chromosome arms is strongly conserved across mosquitos.

    Here, each 100kb locus in Ae. aegypti is assigned a color. For the other species, each 100kb locus is assigned a combination of the colors of the corresponding DNA sequences in Ae. aegypti, weighted by length.

  • Table 1 Assembly statistics for the Hs2-HiC, AaegL2, and CpipJ3 assemblies.

    We did not attempt to further assemble tiny scaffolds contained in each draft assembly. The other scaffolds in each draft were assembled using Hi-C to create huge, chromosome-length scaffolds, and additional small scaffolds.

    Draft scaffolds
    Base pairs2,819,306,7101,310,076,332539,974,961
    Number of contigs80,22336,20448,672
    Contig N50102,92282,61828,546
    Number of scaffolds73,7704,7563,172
    Scaffold N50125,7751,547,048486,756
    Chromosome-length scaffolds
    Base pairs2,654,127,6951,157,961,392492,400,177
    Number of contigs36,61625,58541,051
    Contig N50108,93793,13230,599
    Number of scaffolds2333
    Scaffold N50*141,244,516404,248,146190,989,159
    Small scaffolds
    Base pairs13,416,75482,464,47631,168,201
    Number of contigs8509,4165,609
    Contig N5027,96814,20210,570
    Number of scaffolds8113,9811,224
    Scaffold N5030,46765,34845,079
    Tiny scaffolds
    Base pairs151,762,26114,122,292112,343
    Number of contigs43,2592,22361
    Contig N506,1296,5742,110
    Number of scaffolds43,2312,22225
    Scaffold N506,1446,5779,403

    *The scaffold N50 for the output assemblies is not a particularly meaningful assembly statistic: It is determined almost entirely by the chromosome-length scaffolds, which reflect the length distribution of the chromosomes rather than the quality of the genome assembly. The particular value shown is the length of chromosome X (Hs2-HiC) and chromosome 3 (for AaegL4 and CpipJ3).

    Supplementary Materials

    • De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds

      Olga Dudchenko, Sanjit S. Batra, Arina D. Omer, Sarah K. Nyquist, Marie Hoeger, Neva C. Durand, Muhammad S. Shamim, Ido Machol, Eric S. Lander, Aviva Presser Aiden, Erez Lieberman Aiden

      Materials/Methods, Supplementary Text, Tables, Figures, and/or References

      Download Supplement
      • Materials and Methods
      • Figs. S1 to S21
      • Tables S1 to S14
      • Captions for tables S15 to S20
      • References
      Table S15
      AaegL4 alignments of markers from Juneja et al. Ae. aegypti genetic mapping study (19).
      Table S16
      AaegL4 alignments of markers from Timoshevskiy et al. Ae. aegypti physical mapping study (36).
      Table S17
      CpipJ3 alignments of microsatellite loci markers from Hickner et al. Cx. quinquefasciatus genetic mapping study (21).
      Table S18
      CpipJ3 alignments of RFLP markers from Arensburger et al. Cx. quinquefasciatus genetic mapping study (20).
      Table S19
      CpipJ3 alignments of markers from Unger et al. Cx. quinquefasciatus physical mapping study (37).
      Table S20
      CpipJ3 alignments of markers from Naumenko et al. Cx. quinquefasciatus physical mapping study (38).

    Navigate This Article