The sequence map of rice dispensable genome constructed by a metagenome-like assembly strategy

We report the genome sequences absent from the rice reference genome, Oryza sativa L. ssp. japonica cv. Nipponbare, through a metagenome-like de novo assembly of low-coverage population sequencing data of 1483 cultivated rice varieties. Reads unmapped to the reference genome were used to do the assembly. See for the source of the data used in this study. Two datasets, the indica dispensable genome and the japonica dispensable genome, were obtained through assembly of sequencing data of the indica and japonica rice varieties respectively. The indica dispensable genome contains 52976 contigs while the japonica dispensable genome is comprised of 30349 contigs.

For each contig, the following information is provided:

  1. The sequence of this contig.
  2. The annotation of this contig, i.e., the structure of genes and transposons in this contig. The protein sequences are also provided if this contig was predicted to encode protein-coding genes.
  3. The predicted genomic location of this contig relative to the Nipponbare reference genome obtained utilizing an integration approach based on alignment and linkage disequilibrium.
  4. The sequences of different haplotypes of this contig builed by a local de novo reassembly strategy.
  5. The rice varieties that may harbor this contig.
  6. The alignment of this contig to five sequenced genome of the Oryza genus.

Users can query this database using contig ID or a piece of DNA/protein sequence through Blast.

