ABSTRACT
Rhodobacter capsulatus SB 1003 belongs to the group of purple nonsulfur bacteria. Its genome consists of a 3.7-Mb chromosome and a 133-kb plasmid. The genome encodes genes for photosynthesis, nitrogen fixation, utilization of xenobiotic organic substrates, and synthesis of polyhydroxyalkanoates. These features made it a favorite research tool for studying these processes. Here we report its complete genome sequence.
The genome of Rhodobacter capsulatus SB 1003 consists of a single chromosome containing 3,738,958 bp and a circular plasmid of 132,962 bp. Both the chromosome and the plasmid have relatively high GC content (66.6%) that made DNA sequencing difficult. The project was started by constructing a library of overlapping cosmids covering the whole genome (4, 5). Sequencing was performed by using standard Sanger approaches, and the average sequencing depth was 7. The complete sequence was analyzed by using Critica (1) and Glimmer (3) for the protein-coding genes, tRNAscan (11) and Aragorn (10) for the tRNA and tmRNA genes, and RNAmmer (8) for the rRNA genes. The functions of the predicted protein-coding genes were annotated by comparison with the UniRef90 (14), NCBI-NR (2), COG (15), and KEGG (7) databases. The annotation results were verified using Artemis (13) with dicodon use plots (12).
We found 3,531 open reading frames (ORFs) in the chromosome and 154 ORFs in the plasmid. There are 4 rRNA operons and 53 tRNA genes for all 20 amino acids and one tmRNA gene. The coding density of the R. capsulatus genome is 91%. Functions were assigned to 3,100 ORFs (84.1%). Six hundred ten ORFs (16.6%) represent genes with some similarity to hypothetical genes in databases. The remaining ORFs had no homologues in the databases (e-value < 1 × 10−10). There are 42 putative intact or mutated transposase genes and 237 phage-related genes. An important and specific feature of R. capsulatus is the defective phage called gene transfer agent (GTA) that provides a useful tool for genetic analysis (9). The distribution of genes among the COG metabolic functional classes and detailed descriptions of the metabolic pathways can be found at http://rhodo.img.cas.cz. Based on the GC skew analysis and the orientation of transcription, the origin of replication was localized around 160 kb.
Five complete restriction-modification (RM) systems were identified in the genome. Three of them are type I (RcaSBIP, RcaSBIIIP, and RcaSBIV), one is type III (RcaSBIIP), and one is a type IV system (RcaSBMcrBCP). These RM systems are located mainly in regions of exogenic origin. Several genes encoding these restriction and modification enzymes were isolated and characterized. RcaSBIV recognizes the sequence AGAN7RTAG (H. Strnad and V. Paces, unpublished). Eight CRISPR/Cas systems (6), of which two comprise a considerable number of repetitions (41 and 9), were also found. The others comprise three to five repetitions.
Nucleotide sequence accession numbers.
The nucleotide sequences were deposited in GenBank with accession numbers CP001312 (chromosome) and CP001313 (plasmid pRCB133). Complete analysis of the genome is accessible at http://rhodo.img.cas.cz.
ACKNOWLEDGMENTS
Work in Prague was supported by Czech grants 1M6837805002 and AV0Z50520514. Work in Chicago was supported by The University of Chicago, Division of Biological Sciences, and Integrated Genomics, Inc.
FOOTNOTES
- Received 1 April 2010.
- Accepted 13 April 2010.
↵▿ Published ahead of print on 23 April 2010.
- American Society for Microbiology