Drag and drop the order of the sections and adjust them to your preferences
- The discovery in 1997 of a second coelacanth species in Indonesia, Latimeria menadoensis, was surprising, as it had been assumed that living coelacanths were confined to small populations off the East African coast.
- Fascination with these fish is partly due to their prehistoric appearance—remarkably, their morphology is similar to that of fossils that date back at least 300 Myr, leading to the supposition that, among vertebrates, this lineage is markedly slow to evolve.
- The L. chalumnae genome has been reported previously to have a karyotype of 48 chromosomes.
- The Ensembl gene annotation pipeline created gene models using protein alignments from the Universal Protein Resource (Uniprot) database, limited coelacanth complementary DNA data, RNA-seq data generated from L. chalumnae muscle (18 Gb of paired-end reads were assembled using Trinity software, Supplementary Fig. 2) as well as orthology with other vertebrates.
- We found that the coelacanth genome contains a wide variety of transposableelement superfamilies and has a relatively high transposable-element content (25%); this number is probably an underestimate as this is a draft assembly (Supplementary Note 5 and Supplementary Tables 7–10).
- Analyses of chromosomal breakpoints in the coelacanth genome and tetrapod genomes reveal extensive conservation of synteny and indicate that large-scale rearrangements have occurred at a generally low rate in the coelacanth lineage. Analyses of these rearrangement classes detected several fission events published previously that are known to have occurred in tetrapod lineages, and at least 31 interchromosomal rearrangements that occurred in the coelacanth lineage or the early tetrapod lineage (0.063 fusions per 1 Myr), compared to events (0.054 fusions per 1 Myr) in the salamander lineage and events (0.057 fusions per 1 Myr) in the Xenopus lineage (Supplementary Note 7 and Supplementary Fig. 6). These analyses indicate that karyotypic evolution in the coelacanth lineage has occurred at a relatively slow rate, similar to that of non-mammalian tetrapods.
- In a separate analysis we examined the evolutionary divergence between the two species of coelacanth, L. chalumnae and L. menadoensis, found in African and Indonesian waters, respectively.
- When we compared the liver and testis transcriptomes of L. menadoensis to the L. chalumnae genome, we found an identity of 99.73% (Supplementary Note 8 and Supplementary Fig. 7), whereas alignments between 20 sequenced L. menadoensis bacterial artificial chromosomes (BACs) and the L. chalumnae genome showed an identity of 98.7% (Supplementary Table 11 and Supplementary Fig. 8).
- Over the 400 Myr that vertebrates have lived on land, some genes that are unnecessary for existence in their new environment have been eliminated. To understand this aspect of the water-to-land transition, we surveyed the Latimeria genome annotations to identify genes that were present in the last common ancestor of all bony fish but that are missing from tetrapod genomes.
- Our analysis identified 44,200 ancestral tetrapod coding elements (CNEs) that originated after the divergence of the coelacanth lineage.
- We have identified a region of the coelacanth HOX-A cluster that may have been involved in the evolution of extra-embryonic structures in tetrapods, including the eutherian placenta.
- We have confirmed that the protein-coding genes of L. chalumnae show a decreased substitution rate compared to those of other sequenced vertebrates, even though its genome as a whole does not show evidence of low genome plasticity.
-
The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
-
In 1938 Marjorie Courtenay-Latimer, the curator of a small natural history museum in East London, South Africa, discovered a large, unusual-looking fish among the many specimens delivered to her by a local fish trawler. Latimeria chalumnae, named after its discoverer, was over 1 m long, bluish in colour and had conspicuously fleshy fins that resembled the limbs of terrestrial vertebrates. This discovery is considered to be one of the most notable zoological finds of the twentieth century. Latimeria is the only living member of an ancient group of lobe-finned fishes that was known previously only from fossils and believed to have been extinct since the Late Cretaceous period, approximately 70 million years ago (Myr ago). It was almost 15 years before a second specimen of this elusive species was discovered in the Comoros Islands in the Indian Ocean, and only 309 individuals have been recorded in the past 75 years (R. Nulens, personal communication). The discovery in 1997 of a second coelacanth species in Indonesia, Latimeria menadoensis, was equally surprising, as it had been assumed that living coelacanths were confined to small populations off the East African coast. Fascination with these fish is partly due to their pre-historic appearance—remarkably, their morphology is similar to that of fossils that date back at least 300 Myr, leading to the supposition that, among vertebrates, this lineage is markedly slow to evolve1,5. Latimeria has also been of particular interest to evolutionary biologists, owing to its hotly debated relationship to our last fish ancestor, the fish that first
crawled onto land. In the past 15 years, targeted sequencing efforts
have produced the sequences of the coelacanth mitochondrial genomes, HOX clusters and a few gene families. Nevertheless, coelacanth research has felt the lack of large-scale sequencing data. Here we describe the sequencing and comparative analysis of the genome of L. chalumnae, the African coelacanth.
-
The African coelacanth genome was sequenced and assembled using
DNA from a Comoros Islands Latimeria chalumnae specimen (Sup-
plementary Fig. 1). It was sequenced by Illumina sequencing tech-
nology and assembled using the short read genome assembler
ALLPATHS-LG11. The L. chalumnae genome has been reported previ-
ously to have a karyotype of 48 chromosomes12. The draft assembly is
2.86 gigabases (Gb) in size and is composed of 2.18 Gb of sequence plus
gaps between contigs. The coelacanth genome assembly has a contig
N50 size (the contig size above which 50% of the total length of the
sequence assembly can be found) of 12.7 kilobases (kb) and a scaffold
N50 size of 924 kb, and quality metrics comparable to other Illumina
genomes (Supplementary Note 1, and Supplementary Tables 1 and 2).
-
The genome assembly was annotated separately by both the Ensembl
gene annotation pipeline (Ensembl release 66, February 2012) and by
MAKER13. The Ensembl gene annotation pipeline created gene models
using protein alignments from the Universal Protein Resource (Uni-
prot) database, limited coelacanth complementary DNA data, RNA-seq
data generated from L. chalumnae muscle (18 Gb of paired-end reads
were assembled using Trinity software14, Supplementary Fig. 2) as well
as orthology with other vertebrates. This pipeline produced 19,033
protein-coding genes containing 21,817 transcripts. The MAKER
pipeline used the L. chalumnae Ensembl gene set, Uniprot protein
alignments, and L. chalumnae (muscle) and L. menadoensis (liver
and testis)15 RNA-seq data to create gene models, and this produced
29,237 protein-coding gene annotations. In addition, 2,894 short non-
coding RNAs, 1,214 long non-coding RNAs, and more than 24,000
conserved RNA secondary structures were identified (Supplementary
Note 2, Supplementary Tables 3 and 4, Supplementary Data 1–3 and
Supplementary Fig. 3). It was inferred that 336 genes underwent spe-
cific duplications in the coelacanth lineage (Supplementary Note
-
The question of which living fish is the closest relative to ‘the fish that
first crawled on to land’ has long captured our imagination: among
scientists the odds have been placed on either the lungfish or the
coelacanth16. Analyses of small to moderate amounts of sequence data
for this important phylogenetic question (ranging from 1 to 43 genes)
has tended to favour the lungfishes as the extant sister group to the
land vertebrates17. However, the alternative hypothesis that the lung-
fish and the coelacanth are equally closely related to the tetrapods
could not be rejected with previous data sets.
-
To seek a comprehensive answer we generated RNA-seq data from
three samples (brain, gonad and kidney, and gut and liver) from the
West African lungfish, Protopterus annectens, and compared it to gene
sets from 21 strategically chosen jawed vertebrate species. To perform a
reliable analysis we selected 251 genes in which a 1:1 orthology ratio was clear and used CAT-GTR, a complex site-heterogeneous model of
sequence evolution that is known to reduce tree-reconstruction arte-
facts19 (see Supplementary Methods). The resulting phylogeny, based
on 100,583 concatenated amino acid positions (Fig. 1, posterior prob-
ability 5 1.0 for the lungfish–tetrapod node) is maximally supported
except for the relative positions of the armadillo and the elephant. It
corroborates known vertebrate phylogenetic relationships and
strongly supports the conclusion that tetrapods are more closely
related to lungfish than to the coelacanth (Supplementary Note 4
and Supplementary Fig. 4).
-
The morphological resemblance of the modern coelacanth to its fossil
ancestors has resulted in it being nicknamed ‘the living fossil’1
. This
invites the question of whether the genome of the coelacanth is as
slowly evolving as its outward appearance suggests. Earlier work
showed that a few gene families, such as Hox and protocadherins,
have comparatively slower protein-coding evolution in coelacanth
than in other vertebrate lineages8,10. To address the question, we
compared several features of the coelacanth genome to those of other
vertebrate genomes.
-
Protein-coding gene evolution was examined using the phyloge-
nomics data set described above (251 concatenated proteins) (Fig. 1).
Pair-wise distances between taxa were calculated from the branch
lengths of the tree using the two-cluster test proposed previously20
to test for equality of average substitution rates. Then, for each of
the following species and species clusters (coelacanth, lungfish,
chicken and mammals), we ascertained their respective mean distance
to an outgroup consisting of three cartilaginous fishes (elephant
shark, little skate and spotted catshark). Finally, we tested whether
there was any significant difference in the distance to the outgroup of
cartilaginous fish for every pair of species and species clusters, using a Z statistic. When these distances to the outgroup of cartilaginous fish
were compared, we found that the coelacanth proteins that were
tested were significantly more slowly evolving (0.890 substitutions
per site) than the lungfish (1.05 substitutions per site), chicken (1.09
substitutions per site) and mammalian (1.21 substitutions per site)
orthologues (P , 1026 in all cases) (Supplementary Data 5). In addition,
as can be seen in Fig. 1, the substitution rate in coelacanth is approxi-
mately half that in tetrapods since the two lineages diverged. A Tajima’s
relative rate test21 confirmed the coelacanth’s significantly slower rate
of protein evolution (P , 10220) (Supplementary Data 6).
-
We next examined the abundance of transposable elements in the
coelacanth genome. Theoretically, transposable elements may make
their greatest contribution to the evolution of a species by generating
templates for exaptation to form novel regulatory elements and exons,
and by acting as substrates for genomic rearrangement22. We found
that the coelacanth genome contains a wide variety of transposable-
element superfamilies and has a relatively high transposable-element
content (25%); this number is probably an underestimate as this is a
draft assembly (Supplementary Note 5 and Supplementary Tables
7–10). Analysis of RNA-seq data and of the divergence of individual
transposable-element copies from consensus sequences show that
14 coelacanth transposable-element superfamilies are currently active
(Supplementary Note 6, Supplementary Table 10 and Supplementary
Fig. 5). We conclude that the current coelacanth genome shows both
an abundance and activity of transposable elements similar to many
other genomes. This contrasts with the slow protein evolution observed.
-
Analyses of chromosomal breakpoints in the coelacanth genome
and tetrapod genomes reveal extensive conservation of synteny and
indicate that large-scale rearrangements have occurred at a generally
low rate in the coelacanth lineage. Analyses of these rearrangement
classes detected several fission events published previously23 that are
known to have occurred in tetrapod lineages, and at least 31 inter-
chromosomal rearrangements that occurred in the coelacanth lineage
or the early tetrapod lineage (0.063 fusions per 1 Myr), compared to
20 events (0.054 fusions per 1 Myr) in the salamander lineage and
21 events (0.057 fusions per 1 Myr) in the Xenopus lineage23 (Sup-
plementary Note 7 and Supplementary Fig. 6). Overall, these analyses
indicate that karyotypic evolution in the coelacanth lineage has
occurred at a relatively slow rate, similar to that of non-mammalian
tetrapods24.
-
In a separate analysis we also examined the evolutionary divergence
between the two species of coelacanth, L. chalumnae and L. menadoensis,
found in African and Indonesian waters, respectively. Previous ana-
lysis of mitochondrial DNA showed a sequence identity of 96%, but
estimated divergence times range widely from 6 to 40 Myr25,26. When
we compared the liver and testis transcriptomes of L. menadoensis27
to the L. chalumnae genome, we found an identity of 99.73% (Sup-
plementary Note 8 and Supplementary Fig. 7), whereas alignments
between 20 sequenced L. menadoensis bacterial artificial chromosomes
(BACs) and the L. chalumnae genome showed an identity of 98.7%
(Supplementary Table 11 and Supplementary Fig. 8). Both the genic
and genomic divergence rates are similar to those seen between the
human and chimpanzee genomes (99.5% and 98.8%, respectively;
divergence time of 6 to 8 Myr ago)28, whereas the rates of molecular
evolution in Latimeria are probably affected by several factors, includ-
ing the slower substitution rate seen in coelacanth. This suggests a
slightly longer divergence time for the two coelacanth species.
-
A full description of methods, including information on sample collection,
sequencing, assembly, annotation, all sequence analysis and functional valid-
ation, can be found in the Supplementary Information.
-
A full description of methods, including information on sample collection,
sequencing, assembly, annotation, all sequence analysis and functional valid-
ation, can be found in the Supplementary Information.
-
A full description of methods, including information on sample collection,
sequencing, assembly, annotation, all sequence analysis and functional valid-
ation, can be found in the Supplementary Information.
-
A full description of methods, including information on sample collection,
sequencing, assembly, annotation, all sequence analysis and functional valid-
ation, can be found in the Supplementary Information.