Table Of ContentWorm1:1,42–50;January/February/March2012;G2012LandesBioscience
Toward 959 nematode genomes
Sujai Kumar, Georgios Koutsovoulos, Gaganjot Kaur and Mark Blaxter*
InstituteofEvolutionaryBiology;UniversityofEdinburgh;Edinburgh,UK
Keywords: nematode, genome, next-generation sequencing, second-generation sequencing, wiki
Raw sequencingcosts have droppedfive orders ofmagnitude13
The sequencing of the complete genome of the nematode
inthepasttenyears,whichmeansthatitisnowaviableresearch
Caenorhabditis elegans was a landmark achievement and
usheredinaneweraofwhole-organism,systemsanalysesof goal to obtain genome sequences for all nematodes of interest,
the biology of this powerful model organism. The success of rather than just a few model organisms. Inspired by large-scale
theC.elegansgenomesequencingprojectalsoinspiredcom- genome initiatives for other major taxa14-17 we have initiated a
munities working on other organisms to approach genome push to sequence, in the first instance, 959 nematode genomes.
sequencingoftheirspecies.ThephylumNematodaisrichand Why (only) 959 genomes? The adult hermaphrodite C. elegans
diverseandofinteresttoawiderangeofresearchfieldsfrom has 959 somatic cells, and one of the first major projects that
basic biology through ecology and parasitic disease. For all
turned C. elegans from a local curiosity into a key global research
thesecommunities,itisnowclearthataccesstogenomescale
organismwasthedecipheringofthenear-invariantdevelopmental
datawillbekeytoadvancingunderstanding,andinthecase
©ofpara site2s,dev0eloping1neww2aysto cLontrolaorcunredisedases. ecesll line agBe that igiveos risesto thcese aidulet cellsn, starticng freom th.e
fertilised zygote. In an analogous way we hope that a nematode
The advent of second-generation sequencing technologies,
phylogeny (the evolutionary lineage of the extant species) with
improvements in computing algorithms and infrastructure
andgrowthinbioinformaticsandgenomicsliteracyismaking 959ormorespecieswillbesimilarlycatalyticindrivingnematode
the addition of genome sequencing to the research goals of research programs across the spectrum of basic and applied
any nematode research program a less daunting prospect. science. Obviously, as sequencing technologies improve and
To inspire, promoteDand cooordin atengenoomic stequ endcing isbecomte mroreiaccbessibleu, we wtillemove.beyond this initial goal of
across the diversity of the phylum, we have launched a 959, especially with over 23,000 described species and an
community wiki and the 959 Nematode Genomes initiative estimated one to two million undescribed species in the phylum.
(www.nematodegenomes.org/).Justasthedecipheringofthe
The goal of this article is to describe the current status of
developmental lineage of the 959 cells of the adult
nematode genome research, to encourage everyone to sequence
hermaphroditeC.eleganswasthegatewaytobroadadvances
their favorite nematode and to share genome sequencing
in biomedical science, we hope that a nematode phylogeny
experiences and data. We show how inexpensive it has become
with (at least) 959 sequenced species will underpin further
to obtain high quality draft genomes and introduce the 959
advances in understanding the origins of parasitism, the
dynamics of genomic change and the adaptations that have Nematode Genomes wiki18 as a way to collate and track
madeNematodaone of themost successfulanimal phyla. sequencing projects worldwide.
The Genomes We Have
The worm community has been at the forefront of animal
Introduction genome sequencing since 1998, when Caenorhabditis elegans was
the first metazoan to be fully sequenced.2 The C. elegans genome
The phylum Nematoda is fascinating because it is the most and its extensive annotation is accessible through the WormBase
ubiquitous, numerous and diverse of all animal phyla, present in portal.19 WormBase was one of the first databases to integrate
justaboutevery ecological niche onoursmallplanet.Nematodes genomic, genetic and phenotypic data, and its curators aim to
have been indispensable for research programs in developmental catalog and link all C. elegans literature and research, including
biology,1 genome biology,2,3 evolutionary genomics,4 neurobiol- large scale analyses such as modENCODE.20
ogy,5 aging,6 health7 and parasitology.8 SincethereleaseoftheC.elegansgenome,nineothernematode
In the last two decades, DNA sequencing technology has genomes have been published, including six species parasitic in
evolved dramatically and allowed us to create genome resources plantsandanimals(Table1).OnlyC.elegansandC.briggsaehave
for many of these nematodes, which have transformed our been sequenced to “finished” status2 with all sequence data
understanding of the biology of not just this phylum, but of all organizedintochromosome-sizedpieces.Theremainingeightare
organisms.9-12 high-quality draft genomes and all ten can be accessed at
WormBase19 through graphical genome browsers and via bulk-
data downloads.
*Correspondenceto:MarkBlaxter;Email:[email protected]
On the 959 Nematode Genomes wiki, 26 additional genome
Submitted:11/23/11;Accepted:11/24/11
http://dx.doi.org/10.4161/worm.19046 sequencing projects are listed with publicly available draft
42 Worm Volume1Issue1
REVIEW
Table1.Publishednematodegenomes
Species Systematic Year Technology Genome Numberof ScaffoldN50 ATcontent Numberof
position Published Size(Mbp)† chromosomesor (kbp)†,4 (%)† genes/
(BlaxterClade, scaffoldsin proteins†
HelderClade*) assembly†
Caenorhabditis V,9E 19982 Sanger 100 6chromosomes 17,494 64.6 20,461/25,244
elegans
Caenorhabditis V,9E 20033 Sanger 108 6chromosomes+5 17,485 62.6 NA/21,986
briggsae fragments
Brugiamalayi III,8 200752 Sanger 96 27,210 38 69.4 18,348/21,332
Meloidogynehapla IV,11 200824 Sanger 53 3,452 38 72.6 NA/13,072
Meloidogyneincognita IV,11 200827 Sanger 82 9,538 13 68.6 NA/21,232
Pristionchuspacificus V,9B 200853 Sanger 172 18,083 1,245 57.2 NA/24,217
Caenorhabditis V,9E 201033 Illumina 80 33,559 9 63.7 22,662/26,265
angaria
Trichinellaspiralis I,2A 201136 Sanger 64 6,863 6,373 66.1 16,380/16,380
Bursaphelenchus IV,10D 201137 Roche454, 75 5,527 950 59.6 18,074/18,074
xylophilus Illumina
Ascarissuum III,8 201125 Illumina 273 29,831 408 62 18,542/18,542
© 2012 Landes Bioscience.
*NematodasystematiccladesasdefinedbyBlaxteretal.21andHoltermanetal.29†Nucleargenomeonly,notincludingmitochondriaorendosymbionts,
computedfromWormBasereleaseWS227whereavailableorfromdataURLsinTable2.4ScaffoldN50:Halftheassemblyisinscaffoldsofthissizeorlarger
inthenucleargenome.
assemblies and, in some cases, annotations (Table 2). Seven of Meloidogyne incognita genome27 and the presence of obligate,
Do not distribute.
these are hosted at WormBase and the rest are available either vertically-transmitted symbiont alphaproteobacterial Wolbachia
through the 959 Nematode Genomes website or at sequencing and their genomes inside the cells of many filarial nematodes.28
centerwebsites.Thesedraftgenomesareexpectedtohaveatleast Apart from understanding genome organization and origins,
95% of the genes present in multi-gene sized contigs, but the richer sampling of sequenced genomes would allow a better
exactorderingandchromosomallocationofthecontigsisusually understanding of the phylogeny of Nematoda and the evolution-
notknown.Despitetheseshortcomings,draftdataareveryuseful arydynamicsofimportanttraits—suchasparasitismofplantsand
for comparative and evolutionary genomics or simply for animals—and developmental modes. The most comprehensive
identifying single genes of interest. Early access to these data molecular phylogenies of Nematoda have been based on a single
not only allows researchers to test hypotheses, but, equally gene, the ~1600 bp nuclear small subunit rRNA locus,21,29,30 but
importantly, to identify potential problems early in the assembly this single locus is insufficient for robust resolution of the deep
process. Researchers wishing to publish analyses using pre- divergences in the phylum. Methods for generating large-scale
publication draft data should contact the sequencing center or multi-genephylogeniesnowexistandcanbeappliedeventodraft
lead researchers for permissions (and also to see if better versions genomes.
of these data are or will soon be available). Whenlargescaleexpressedsequencetag(EST)sequencingwas
first performed,31 new insights into nematode gene evolution
Why We Need More: One Nematode Genome Does became possible from the partial catalogs of expressed genes.22
Not a Phylum Make More nematode genomes, even draft-quality ones, take those
insights several steps further, as they allow analysis of complete
C.elegansisanexcellentmodelnematodeanditsgenome,withits genecatalogs.Additionally,whole-genomeresourcesincludenon-
wealth of annotation, is an excellent model genome. However genic regions, such as the regulatory regions upstream of genes,
C. elegans cannot be taken to represent all nematode genomes which are often even more conserved than coding regions and
(Fig.1). We know that C. elegans is quite derived within may function in developmental regulation.32,33
Nematoda21 and that it lacks many genes shared between other
nematodes and other Metazoa.22 Nematode genomes have been How to Make More
sized from 20 Mb to 500 Mb (i.e., one fifth to five times that
of C. elegans).23 Sequenced nematode genomes range from C. elegans was sequenced over a decade ago using Sanger
Meloidogyne hapla24 at 54 Mb to Ascaris suum25 at 273 Mb. sequencing. At that time, sequencing the genome to ten-fold
Interesting genomic features in other species include chromatin depth took a decade and cost roughly $10 M. Once the
diminutioninAscarissuumandotherascaridids(i.e.,thegermline sequencing was completed, similar resources were required to
has a larger genome than the soma26), aneuploid triploidy in the finish the genome. Sanger sequencing is still considered the gold
www.landesbioscience.com Worm 43
Table2.Nematodespeciesforwhichpublishedordraftgenomedataarepubliclyavailable
Species(Strain) Status GenomedataandbrowserURLs
Ascarissuum(Davis) ongoing www.ncbi.nlm.nih.gov/nuccore/320321071
Ascarissuum(Victoria/Ghent) published ftp://ftp.wormbase.org/pub/wormbase/species/a_suum/
Ascarissuum(WTSI) ongoing www.sanger.ac.uk/resources/downloads/helminths/ascaris-suum.html
Brugiamalayi(TRS) published www.wormbase.org/db/gb2/gbrowse/b_malayi
ftp://ftp.wormbase.org/pub/wormbase/species/b_malayi/
Bursaphelenchusxylophilus(Ka4C1) published www.genedb.org/Homepage/Bxylophilus
Caenorhabditisangaria(PS1010) published www.wormbase.org/db/gb2/gbrowse/c_angaria/
ftp://ftp.wormbase.org/pub/wormbase/species/c_angaria/
Caenorhabditisbrenneri(PB2801) complete www.wormbase.org/db/gb2/gbrowse/c_brenneri/
ftp://ftp.wormbase.org/pub/wormbase/species/c_brenneri/
Caenorhabditisbriggsae(AF16) published www.wormbase.org/db/gb2/gbrowse/c_briggsae/
ftp://ftp.wormbase.org/pub/wormbase/species/c_briggsae/
Caenorhabditiselegans(N2) published www.wormbase.org/db/gb2/gbrowse/c_elegans/
ftp://ftp.wormbase.org/pub/wormbase/species/c_elegans/
Caenorhabditisjaponica(DF5081) complete www.wormbase.org/db/gb2/gbrowse/c_japonica/
ftp://ftp.wormbase.org/pub/wormbase/species/c_japonica/
Caenorhabditisremanei(PB4641) complete www.wormbase.org/db/gb2/gbrowse/c_remanei/
ftp://ftp.wormbase.org/pub/wormbase/species/c_remanei/
© Cae2norhabd0itissp111(JU13723) Loangoingndgeenome.swustl.e du/Bpub/orgainismo/Invertesbrates/cCaenoirhabeditis_spn11_JU13c73/ e.
Caenorhabditissp5DRD-2008(JU800) ongoing nematodes.org/downloads/959nematodegenomes/blast
Caenorhabditissp7(JU1286) ongoing ftp://ftp.wormbase.org/pub/wormbase/species/c_sp7/
Caenorhabditissp9(AC-2009JU1422) inannotation ftp://ftp.wormbase.org/pub/wormbase/species/c_sp9/
Dictyocaulusviviparus(Notspecified) ongoing www.nematode.net/
Dirofilariaimmitis(EdinDburgh/TRSo/Basel) ninaonnotatiotn disnetmatordes.oirg/dbownloadus/959netmateodegen.omes/blast
Globoderapallida(Notspecified) ongoing www.sanger.ac.uk/sequencing/Globodera/pallida/
Hemonchuscontortus(Moredun) ongoing www.sanger.ac.uk/Projects/H_contortus/
ftp://ftp.wormbase.org/pub/wormbase/species/h_contortus/
Heterorhabditisbacteriophora(M31e) inannotation genome.wustl.edu/genome.cgi?GENOME=Heterorhabditis%20%20bacteriophora
Howardulaaoronymphium(Jaenike) ongoing nematodes.org/downloads/959nematodegenomes/blast
Litomosoidessigmodontis(labstrainestablished ongoing nematodes.org/downloads/959nematodegenomes/blast
fromCameroonbyOdileBain)
Loaloa(Nutman/Broad) inannotation www.broadinstitute.org/annotation/genome/filarial_worms/MultiHome.html
Meloidogynehapla(VW9) published www.hapla.org/
www.wormbase.org/db/gb2/gbrowse/m_hapla/
ftp://ftp.wormbase.org/pub/wormbase/species/m_hapla/
Meloidogyneincognita(Morelos) published www.inra.fr/meloidogyne_incognita
www.wormbase.org/db/gb2/gbrowse/m_incognita/
ftp://ftp.wormbase.org/pub/wormbase/species/m_incognita/
Nippostrongylusbrasiliensis(labstrain) ongoing www.sanger.ac.uk/sequencing/Nippostrongylus/brasiliensis/
Onchocercaochengi(Cameroon/wild) inannotation nematodes.org/downloads/959nematodegenomes/blast
Onchocercavolvulus(Nutman/Broad) ongoing www.broadinstitute.org/annotation/genome/filarial_worms/MultiHome.html
Onchocercavolvulus(WTSI/wildLiberia) ongoing www.sanger.ac.uk/resources/downloads/helminths/onchocerca-volvulus.html
Oscheiustipulae(CEW1) ongoing nematodes.org/downloads/959nematodegenomes/blast
Pristionchuspacificus(California) published www.pristionchus.org/
www.wormbase.org/db/gb2/gbrowse/p_pacificus/
ftp://ftp.wormbase.org/pub/wormbase/species/p_pacificus/
Strongyloidesratti(ED321) inannotation www.sanger.ac.uk/resources/downloads/helminths/strongyloides-ratti.html
Teladorsagiacircumcincta(Notspecified) ongoing www.sanger.ac.uk/resources/downloads/helminths/teladorsagia-circumcincta.html
Trichinellaspiralis(Notspecified) published www.nematode.net/
www.wormbase.org/db/gb2/gbrowse/t_spiralis/
ftp://ftp.wormbase.org/pub/wormbase/species/t_spiralis/
Trichurismuris(Eisolate) ongoing www.sanger.ac.uk/Projects/T_muris/
Wuchereriabancrofti(Nutman/Broad) inannotation www.broadinstitute.org/annotation/genome/filarial_worms/MultiHome.html
Note:Seewww.nematodes.org/nematodegenomes/index.php/Strains_with_Dataforanup-to-datelist.
44 Worm Volume1Issue1
© 2012 Landes Bioscience.
Do not distribute.
Figure1.SystematictreeofNematodaindicatingcurrentsequenced,inprogressorproposedgenomesequencingprojects.Thesystematicarrangement
ofNematodaisbasedonDeLeyandBlaxter;51thecladesdefinedbyBlaxteretal.21andvanMengenetal.54areindicated.Foreachmajorgroupwe
summarizethetrophicecology(microbivore,predator,fungivore,plantparasite,non-vertebrateparasiteorassociate,vertebrateparasite)andthe
numberofspeciesforwhichgenomeprojectsarereportedinthe959NematodeGenomeswiki.FiguredevelopedfromBlaxter.55
standardintermsofquality,butbecauseofthehighcostandtime time-consuming step and uses only whole-genome shotgun
investment, it is unlikely that there will be any more Sanger- sequencing. As a result, current genome projects typically result
sequenced nematode genomes. in draft genomes with multi-gene sized contigs rather than
Sequencing the C. elegans genome was based on an array of chromosome-sized sequences. The substantial additional effort
mapped and ordered large-insert genomic clones, which greatly required to finish a genome is necessary if the goal is to study
facilitated assembly. Most genome sequencing today avoids this chromosome organization or long-range regulation. However,
www.landesbioscience.com Worm 45
manyquestionsaboutphylogenetics,geneevolutionandsharedor information,33 linking genome sequence contigs that contain
novel gene functions can be approached using high-quality draft exons for a gene that cannot be joined by genome sequence data
genomes generated at a tiny fraction of the time and cost of a because of repeats.
finished genome. Inthelastyear,genomesequenceshavebeenpublishedforfour
Second-generation sequencing platforms have dramatically nematode species. Each project used different sequencing
reduced costs and increased throughput, with the trade-off of strategies. The genome of Trichinella spiralis was determined
reducedreadlengthcomparedwithSangerdideoxyreads(Table3). usingtraditionalSangerdideoxysequencing,36witha33-foldbase
Shorter reads mean that most genomic repeats are longer than a coverage in the final assembly. Bacterial artificial chromosome
read, and the only way to attempt to resolve them in a genome clonesandmultiple-sizeinsertclonelibrarieswereusedtoscaffold
assembly is to use pairs of reads sequenced from opposite ends of the64Mbgenome.TheBursaphelenchusxylophilus37genomewas
fragmentsthatarelongerthan therepeats.Sophisticatedassembly sequencedusingIlluminaPEandRoche454single-endreads for
programs that use high sequencing depth and multiple insert basic contig generation and Roche 454 MP for scaffolding
libraries to get around the problems of sequencing errors and contigs. For Caenorhabditis angaria,33 Illumina PE (from libraries
repeats have been developed specifically for second-generation with multiple insert sizes from 200–450 bp) totalling 170-fold
data.34 coverage were used, and then deep transcriptome data (Illumina
Each platformhas different read lengthsand error profiles that RNA-Seq)wereused toimprovethisassembly.Thiswasthefirst
affect their suitability for de novo genome sequencing projects. genome project to use RNA-Seq reads to scaffold genomic
TheIlluminaplatformsgeneratereadsupto150basesandarethe contigs.TwoversionsoftheA.suumgenomehavebeenreleased.
workhorsesofsequencingprojects.Illuminasequencingerrorsare Wang et al.38 generated an assembly using Roche 454 and
usually miscalled bases and higher read-depths are recommended Illumina data from short insert libraries and mate-pair data from
© 2012 Landes Bioscience.
to consensus-correct such errors. Roche 454 reads can extend to 5.5 kb libraries sequenced using Sanger dideoxy technology as
750 bases but are more expensive than the shorter-read part of an extensive transcriptome sequencing project. Jex et al.25
technologies. In Roche 454 data, sequencing depths higher than used a mix of Illumina PE 170 bp and 500 bp PE reads,
30-fold are not recommended35 because homopolymer errors scaffolded with Illumina MP data from 800 bp, 2 kb, 5 kb and
accumulate and confound assembly algorithms. Life Tech’s 10 kb libraries. Interestingly, these long-insert MP libraries were
SOLiD technology generates short (~75 base) reads but is not generated from DNA that was whole-genome amplified using
Do not distribute.
suitable for de novo genome sequencing because each base is strand-displacing isothermal amplification, a technology that
represented by two “colors” (readings) and sequencing errors are holds great promise for additional nematode genome projects
difficult to identify in the absence of a reference sequence. where starting materials may be limiting.
Different combinations of technologies, insert lengths and So which strategy should you use? If you are on a bargain-
depths of coverage can be employed to exploit the best basementbudgetandwantthemostvalueformoney,asinglelaneof
characteristics of each and minimise known classes of errors. In IlluminaHiSeq2000PE(100basesplus100bases)sequencingwith
particular,paired-endsequencingfromamixoflibraryinsertsizes multiplexed 300 bp and 600 bp PE libraries can provide a highly
appears to be optimal for de novo assembly, using short-insert usabledraftgenome.Forexample,inourlaboratory,Caenorhabditis
(200–700 bp) paired end (PE) libraries complemented by long- species5wasrecentlysequencedusingthisstrategyandresultedina
insert (1–20 kb) mate pair (MP) libraries. While PE data derive draftassemblyspanning131Mbinonly16,384scaffolds,withmore
fromdirectlycapturedgenomefragmentsandarethuslargelyfree thanhalftheassemblyinscaffoldslargerthan31kb(S.Kumar,A.
of chimaeras, construction of MP libraries involves additional Cutter,M-A.Felix,M.Blaxter,unpublished;seewww.nematodes.
manipulations, including circularisation of long DNAs, that can org/nematodegenomes/index.php/Caenorhabditis_sp._5_DRD-
result in high proportions of chimaeric or aberrantly short virtual 2008_JU800).Roche454dataaremoreexpensivebaseforbase
inserts. MP data are typically used for scaffolding contigs than Illumina, but usually assemble into longer contigs at the
generated from PE data, which are generated in higher coverage. same effective coverage. Mate pair data serve to scaffold the
Deep sequencing of the transcriptome can also yield scaffolding primary contigs generated from single-end Roche 454 or PE
Table3.Currentsequencingcosts,throughputandreadlengths
Technology Readlength Errormodel Recommended Costper Costper100 Throughput Timeper100Mb
(bases) sequencing base Mbgenome (bases/day/ genomeper
depth (£/J/$) (£/J/$) instrument) instrument(days)
Sanger 1000–1500 Goldstandard,accuratebase 10X 1023 106 106 103
dideoxy quality,typicalerrorprobability
0.0001
Roche454 400–1000 Homopolymererrors 20–30X 1025 2!104 5!108 5
FLX/FLX+
Illumina 100–150 Typicalerrorprobability0.01, 50–100X 1027 103 1010 1
HiSeq2000 Lowerqualitytowardendof
read
46 Worm Volume1Issue1
Illumina sequencing and significantly improve the assembled content closer to 50%. Ultra-long reads could span repetitive
fragment lengths. Construction of MP libraries for Illumina or regionsandthuseaseassembly.PacBioSMRT47isthefirstsingle-
Roche 454 sequencing requires much more and higher quality moleculetechnologytobereleasedcommerciallyandcanproduce
starting DNA than do PE libraries and MP libraries are more readsover2kb,buthasrelativelylowthroughputandanaccuracy
costly to produce. In addition to genomic sequencing, a single far lower than second-generation sequencers. Another single-
lane of Illumina HiSeq2000 RNA-Seq data (100 base PE reads moleculetechnologyisfromOxfordNanopore,48whichpromises
from300bplibrariesmadefromRNApooledfrommanystages) high-quality, high-throughput reads with no theoretical length
is highly recommended for aiding assembly and annotation. limit. However, the company has not yet released any data or
metricsonerrorrates,readlengths,throughputorcosts,soallwe
The Costly As: Assembly, Annotation and Analysis cansayisthatthetechnologywillchangegenomesequencingifit
works.
The generation of the raw sequence data are rapidly becoming a
marginal cost in a genome sequencing program: the relative cost Keeping Track Using the 959 Nematode
and time taken for assembly, annotation and analysis post- Genomes Wiki
sequencingismuchgreater.Rawreadsneedtobequalitychecked,
checkedforcontaminantsandassembled.Theassembliesneedto We set up the 959 Nematode Genomes wiki (959NG wiki) at
be verified and possibly repeated in turn and then annotated to www.nematodegenomes.org to keep track of genomes being
identify genes and other genomic features of interest such as sequenced and published.18 As second-generation sequencing
regulatory regions, repeats and transposons. The intricacies of becomes more accessible, we anticipate that several hundred
assembly algorithms, assembly strategies and annotation options nematodes will be sequenced in the next few years. Although the
© 2012 Landes Bioscience.
arebeyondthescopeofthisarticle,butawidevarietyofexcellent INSDCdatabases(GenBank/ENA/DDBJ)arethefirstsourcesthat
methods and tools have been published.35,39-46 The most most of us turn to when looking for sequences from or related to
comprehensive recent analysis of assembly strategies for complex ourorganismofinterest,genomesareoftendepositedthereonlyat
eukaryotic genomes was the Assemblathon.34 If all goes well, a thetime of publication, and this canyield theimpression that no
nematode genome can, in theory, be shepherded from DNA projectisunderway.Wehopethe959NGwikiwillenablegenomic
extraction to an annotated assembly and be ready for further resourcestobesharedpre-publication,avoidduplicationofeffort,
Do not distribute.
analyses in as little as a month. allow new genomes to be proposed and forge collaborations
Both bioinformatics and sequencing technologies are changing betweenresearchersinterestedinthesamespeciesorclade.
so rapidly that recommendations on strategies may quickly Web-based databases for tracking genome sequencing projects
become obsolete. For bioinformatics solutions, the most up-to- are not a new idea and we know of at least four (diArk,49
date tips and recommendations will probably come from low- Genomes OnLine Database (GOLD),50 The International
latency sources such as conference presentations, blog posts, Sequencing Consortium (www.intlgenome.org) and Genome
forums,crowdsourcedQ&Asitesandcollaborativewikissuchas News Network (www.genomenewsnetwork.org). However, only
959 Nematode Genomes (as described below). For sequencing, thefirsttwoarecurrentlymaintainedandallfourrelyoncentrally
the two emerging wet laboratory technologies that could updating the database whenever a new genome is proposed or
dramatically change how we sequence nematodes are whole released.As959NGisawikiwhere anyone cansign intoaddor
genome amplification and single-molecule sequencing. edit information, we anticipate that the site will stay up to date,
Whole-genomeamplification(WGA)hasbeenusedtogenerate and because it is specific to nematodes, it is more likely to be of
sufficient quantities of DNA from tissues of single A. suum for use to the nematode community.
MP libraries.25 This opens the prospect of using WGA on single Thehomepageofthewikihaslinkstoalltheimportantpartsof
nematode specimens, though the mass of DNA input from A. the site (Fig.2). The 959NG wiki is organized taxonomically
suum used by the BGI team (200 ng) is much more than is using the systematics proposed by De Ley and Blaxter51 (Fig.1).
present in most individual nematodes (one C. elegans adult We also use the clades defined by Blaxter et al.21 and Holterman
contains ~200 pg). Proof that amplification does not overly bias et al.29 and derive other systematic information from the NCBI
sequencing coverage or generate chimaeras that mislead assembly taxonomy. The tree is editable, so if new evidence is found for
algorithms would be a major advance. Sequencing from single resolutionofanyparaphyleticnodesorrearrangements,additional
nematodes will reduce the assembly issues arising from extremes nodes can be added or taxa reassigned simply by changing the
of heterozygosity observed in wild populations and will allow parent taxon for that set of taxa. For any taxon (class, order,
researchers to select specimens directly from environmental family, genus, etc.), the wiki lists all the species and strains that
samples. have active or proposed genome projects. For published projects
Thepromiseofsingle-moleculesequencingisthegenerationof weencourageadditionofPubMedIDsforpublicationsandlinks
ultra-long reads (several kilobases) from templates that have to genome browsers and data repositories. For species where a
undergone a minimum of in vitro manipulation. It is well genomeprojectis“ongoing”weassociatetheprojectwithastrain
recognized in second-generation sequencing that the several PCR ofthespecies,topermitmorethanoneindependentprojecttobe
steps involved can exclude some regions of a genome from registered. Again, genome project leaders are encouraged to add
sequencingandpositivelybiassequencingtoregionsthathaveGC links to project web pages and data access portals.
www.landesbioscience.com Worm 47
© 2012 Landes Bioscience.
Figure2.The959NematodeGenomeswikihomepage.
Do not distribute.
One goal of the 959NG wiki is to reduce the “activation Join the 959 Nematode Genomes Initiative
energy” for starting a new genome project. Embarking on a
genome scale endeavor can be daunting, but we hope that the The 959 Nematode Genomes initiative (and the959NG wiki)is
959NG wiki will promote collaboration on genomes of interest. open to all and we encourage all interested to join. Anyone can
Individual researchers can “propose” a species (and strain) for view the wiki (and free registration gives editing rights). The
genomesequencing andregisterinterestinspeciesthathavebeen 959NG wiki will only be as good as (and as up to date as) the
proposed. By making interests known, it is more likely that information we, collectively, enter. In particular, we would
fruitful collaborations will ensue. We know of two multi-center encourage registration of interest in ongoing and proposed
projects, both now mature, where the proponents first met on genomesandtheactiveproposalofadditionalnematodegenomes
the 959NG wiki. for sequencing. As the community of researchers producing and
Finding current genome data can be frustrating. We therefore consuming new nematode genomes grows, the synergy of
provide alistofavailabledataportalsforgenomes sequenced and combining skills and discoveries in data generation, assembly
in sequencing. These include genome browsers (such as those and annotation will become more evident and will facilitate the
provided by WormBase) as well as data download sources. The generation of new genomes. The availability of large numbers of
9595NG wiki also includes a standard BLAST search portal that phylogenetically diverse genomes will also—we hope—inspire a
allowsresearcherswithspecific(gene-centered)interestsinoneor new breed of nematode genomics researchers not wedded to any
many genomes to query available published and pre-publication onespeciesbuthungryfordataacrossthephylumandthuseager
draft genomes. to collaborate in the analyses of new genomes.
The959NGwikiisbuiltontheMediaWikiplatform(thesame The 959NG wiki will evolve as the community evolves. The
tools that run Wikipedia) and uses the Semantic MediaWiki snapshotpresentedhere(Tables1and2)willsoonbeoutofdate.
(SMW) extension. Each page about a strain, species, taxon, The open architecture of the SMW system will allow us to add
researcher or sequencing center has semantic properties associated additional concepts and linking data between genomes and thus
with it that can be queried in new ways to extract inferred the wiki should also be able to nucleate and serve special interest
relationshipsandnewpropertiescanbeaddedtoanypagewithout groups where the core themes are not simply systematic, but
changing any database schemas. For example, we plan to add ratherothersharedphenotypes(reproductivemode,parasitism)or
lifecycle strategies (as shown in Fig.1) to taxon pages to enable specificgenesetsorsystems.Byidentifyingcolleagueswithshared
queriessuchas“Listallongoinggenomeprojectsforplantparasitic interests,jointfundingtogeneratenematodegenomedatawillbe
nematodes with genomes smaller than 100 Mb.” Other query more easily sourced. The collective experience embodied in the
examplesanddetailsofhowSMWisanappropriatetechnologyfor 959NG wiki will also mean that the costs (in both consumables
suchsitescanbefoundinKumar,SchifferandBlaxter.18 andhumaneffort)ofdenovosequencingagenomewillcontinue
48 Worm Volume1Issue1
to drop and multi-genome projects will become even more advice on bioinformatics analyses. We would also like to thank
attractive to funding agencies and more rewarding for the Dan Lawson at the European Bioinformatics Institute for
nematode genomes community. inspiring us by setting up arthropodgenomes.org using the
SMW platform. The SMW community was very helpful in
Acknowledgments answering questions about customising the platform. S. K. is
We acknowledge the input of Philipp Schiffer in the initiation funded by a School of Biological Sciences PhD studentship and
of the 959NG wiki. We thank all our colleagues around the the Overseas Research Student Awards Scheme at the University
world who are sequencing nematode genomes and adding of Edinburgh; G. Kuar is funded by MRC funding to the
information to the wiki. At the University of Edinburgh, we GenePool, Edinburgh (G00900740) and the EU Project
especially want to thank Karim Gharbi and colleagues at the “Enhanced protective Immunity Against Filariasis” focused
GenePool Genomics and Bioinformatics Facility for their research project (SICA; contract number 242131);
expertise and advice on de novo genome sequencing and the G. Koutsovoulos is funded by a BBSRC School of Biological
members of the Blaxter Lab for their constant support and Sciences University of Edinburgh PhD studentship.
References 13. National Human Genome Research Institute. DNA 27. Abad P, Gouzy J, Aury J-M, Castagnone-Sereno P,
SequencingCosts.genome.gov,2011. DanchinE,DeleuryE,etal.Genomesequenceofthe
1. SulstonJE,SchierenbergE,WhiteJG,ThomsonJN.
The embryonic cell lineage of the nematode 14. The 1000 Genomes Project Consortium.. A map of metazoan plant-parasitic nematode Meloidogyne incog-
Caenorhabditis elegans. Dev Biol 1983; 100:64-119; humangenomevariationfrompopulation-scalesequen- nita.NatBiotechnol2008;26:909-15;PMID:18660804;
PMID:6684600; http://dx.doi.org/10.1016/0012- cing. Nature 2010; 467:1061-73; PMID:20981092; http://dx.doi.org/10.1038/nbt.1482
1606(83)90201-4 http://dx.doi.org/10.1038/nature09534 28. Fenn K. Are filarial nematode Wolbachia obligate
2. The C elegans Genome Sequencing Consortium. 15. Genome10KCommunityofScientists.Genome10K: mutualistsymbionts?TrendsEcolEvol2004;19:163-
©Gpleantfoomr me sf2eoqrueinncvees0toigfattihneg n1beimolaotgoyd.e2SCci.enecleeg a1n9s:9L8a; aA10Pn0r0o0poVsaelrtdetboraOtebStapieencieWs.hJolHse-eGreedno2m 00e9B;Se1q0u0e:n65ce9i-7fo4r;os62;00P4M.0I1cD.0:012670i124e8; http:/n/dx.doi.ocrg/10.1e016/j.tre.e.
282:2012-8; PMID:9851916; http://dx.doi.org/10. PMID:19892720;http://dx.doi.org/10.1093/jhered/esp086 29. Holterman M, van der Wurff A, van den Elsen S,
1126/science.282.5396.2012 16. Robinson G, Hackett K, Purcell-Miramontes M, van Megen H, Bongers T, Holovachov O, et al.
3. SteinLD,BaoZ,BlasiarD,BlumenthalT,BrentMR, BrownS,EvansJ,GoldsmithM,etalCreatingabuzz Phylum-Wide Analysis of SSU rDNA Reveals Deep
ChenN,etal.ThegenomesequenceofCaenorhabditis aboutinsectgenomes.Science2011;331. Phylogenetic Relationships among Nematodes and
briggsae: a platform for comparative genomics. PLoS 17. CaoJ,SchneebergerK,OssowskiS,GuntherT,Bender Accelerated Evolution toward Crown Clades. Mol
Biol2003;1:e45;PMID:14624247;http://dx.doi.org/ S,FitzJ,etal.Whole-genomesequencingofmultiple Biol Evol 2006; 23:1792-800; PMID:16790472;
10.1371/journal.pbio.0000045 Arabidopsis thaliana populations. Nat Genet 2011; http://dx.doi.org/10.1093/molbev/msl044
4. Cutter AD, Dey A, MurDray R. Eovolution ofnthe o43:9t56-63 ; dPMID:2i1874s002; thttp:/r/dx.doii.orgb/10. u30. vantMegeenH,va.ndenElsenS,HoltermanM,Karssen
CaenorhabditiselegansGenome.MolBiolEvol2009; 1038/ng.911 G,MooymanP,BongersT,etal.Aphylogenetictreeof
26:1199-234; PMID:19289596; http://dx.doi.org/10. 18. Kumar S, Schiffer P, Blaxter M. 959 Nematode nematodesbasedonabout1200full-lengthsmallsubunit
1093/molbev/msp048 Genomes:asemanticwikiforcoordinatingsequencing ribosomalDNAsequences.Nematology2009;11:927-
5. WhiteJG,SouthgateE,ThomsonJN,BrennerS.The projects.NucleicAcidsResearch2011. 50;http://dx.doi.org/10.1163/156854109X456862
Structure of the Nervous System of the Nematode 19. HarrisTW,AntoshechkinI,BieriT,BlasiarD,ChanJ, 31. Parkinson J, Mitreva M, Whitton C, Thomson M,
Caenorhabditis elegans. Philos Trans R Soc Lond B ChenW,etal.WormBase:acomprehensiveresource DaubJ,MartinJ,etal.Atranscriptomicanalysisofthe
Biol Sci 1986; 314:1-340; http://dx.doi.org/10.1098/ for nematode research. Nucleic Acids Res 2010; 38: phylum Nematoda. Nat Genet 2004; 36:1259-67;
rstb.1986.0056 D463-7;PMID:19910365;http://dx.doi.org/10.1093/ PMID:15543149;http://dx.doi.org/10.1038/ng1472
6. Crittenden SL, Eckmann C, Wang L, Bernstein D, nar/gkp952 32. Vavouri T, Walter K, Gilks W, Lehner B, Elgar G.
Wickens M, Kimble J. Regulation of the mitosis/ 20. Gerstein MB, Lu Z, Van Nostrand E, Cheng C, Parallel evolution of conserved non-coding elements
meiosis decision in the Caenorhabditis elegans germ- Arshinoff B, Liu T, et al. Integrative Analysis of the thattargetacommonsetofdevelopmentalregulatory
line. Philos Trans R Soc Lond B Biol Sci 2003; CaenorhabditiselegansGenomebythemodENCODE genesfromwormstohumans.GenomeBiol2007;8:
358:1359-62; PMID:14511482; http://dx.doi.org/10. Project.Science2010;330:1775-87;PMID:21177976; R15; PMID:17274809; http://dx.doi.org/10.1186/gb-
1098/rstb.2003.1333 http://dx.doi.org/10.1126/science.1196914 2007-8-2-r15
7. WangMC,O’RourkeE,RuvkunG.FatMetabolism 21. BlaxterML,DeLeyP,GareyJ,LiuL,ScheldemanP, 33. Mortazavi A, Schwarz E, Williams B, Schaeffer L,
Links Germline Stem Cells and Longevity in C. Vierstraete A, et al. A molecular evolutionary frame- Antoshechkin I, Wold B, et al. Scaffolding a
elegans.Science2008;322:957-60;PMID:18988854; workforthephylumNematoda.Nature1998;392:71- Caenorhabditis nematode genome with RNA-seq.
http://dx.doi.org/10.1126/science.1162011 5;PMID:9510248;http://dx.doi.org/10.1038/32160 Genome Res 2010; 20:1740-7; PMID:20980554;
http://dx.doi.org/10.1101/gr.111021.110
8. Brooker S. Estimating the global distribution and 22. WasmuthJ,SchmidR,HedleyA,BlaxterM.Onthe
diseaseburdenofintestinalnematodeinfections:adding extent and origins of genic novelty in the phylum 34. EarlD,BradnamK,StJohnJ,DarlingA,LinD,Faas
up the numbers–a review. Int J Parasitol 2010; Nematoda. PLoS Negl Trop Dis 2008; 2:e258; J,etalAssemblathon1:Acompetitiveassessmentofde
40:1137-44; PMID:20430032; http://dx.doi.org/10. PMID:18596977; http://dx.doi.org/10.1371/journal. novoshortreadassemblymethods.GenomeResearch
1016/j.ijpara.2010.04.004 pntd.0000258 2011.
9. FireA,XuS,MontgomeryMK,KostasSA,DriverSE, 23. GregoryTR,NicolJ,TammH,KullmanB,KullmanK, 35. FinotelloF,LavezzoE,FontanaP,PeruzzoD,Albiero
MelloCC.Potentandspecificgeneticinterferenceby LeitchI,etal.Eukaryoticgenomesizedatabases.Nucleic A,BarzonL,etalComparativeanalysisofalgorithms
double-stranded RNA in Caenorhabditis elegans.. AcidsRes2007;35:D332-8;PMID:17090588;http:// for whole-genome assembly of pyrosequencing data.
Nature1998;391:806-11;PMID:9486653;http://dx. dx.doi.org/10.1093/nar/gkl828 BriefingsinBioinformatics2011.
doi.org/10.1038/35888 24. Opperman CH, Bird D, Williamson V, Rokhsar D, 36. Mitreva M, Jasmer D, Zarlenga D, Wang Z,
10. Hengartner MO, Ellis R, Horvitz R. Caenorhabditis BurkeM,CohnJ,etal.Sequenceandgeneticmapof Abubucker S, Martin J, et al. The draft genome of
elegansgeneced-9protectscellsfromprogrammedcell Meloidogynehapla:Acompactnematodegenomefor theparasiticnematodeTrichinellaspiralis.NatGenet
death. Nature 1992; 356:494-9; PMID:1560823; plant parasitism. Proc Natl Acad Sci USA 2008; 2011;43:228-35;PMID:21336279;http://dx.doi.org/
http://dx.doi.org/10.1038/356494a0 105:14802-7; PMID:18809916; http://dx.doi.org/10. 10.1038/ng.769
11. HorvitzHR,SternbergP.Multipleintercellularsignalling 1073/pnas.0805946105 37. KikuchiT,CottonJ,DalzellJ,HasegawaK,Kanzaki
systems controlthe developmentof theCaenorhabditis 25. JexA,LiuS,LiB,YoungN,HallR,LiY,etalAscaris N, McVeigh P, et al. Genomic Insights into the
elegansvulva.Nature1991;351:535-41;PMID:1646401; suumdraftgenome.Nature2011. OriginofParasitismintheEmergingPlantPathogen
Bursaphelenchus xylophilus. PLoS Pathog 2011; 7:
http://dx.doi.org/10.1038/351535a0 26. MüllerF,BernardV,ToblerH.Chromatindiminutionin
e1002219; PMID:21909270; http://dx.doi.org/10.
12. BlaxterM.Nematoda:Genes,GenomesandtheEvolu- nematodes.Bioessays1996;18:133-8;PMID:8851046;
1371/journal.ppat.1002219
tion of Parasitism. Advances in Parasitology 2003; http://dx.doi.org/10.1002/bies.950180209
54:101-95.
www.landesbioscience.com Worm 49
38. WangJ,CzechB,CrunkA,Wallace A,Mitreva M, 45. LinY,LiJ,ShenH,ZhangL,PapasianC,DengHW. 51. De Ley P, Blaxter M. Systematic position and
HannonG,etal.DeepsmallRNAsequencingfrom Comparativestudiesofdenovoassemblytoolsfornext- phylogeny.In:LeeDL,ed.Thebiologyofnematodes:
the nematode Ascaris reveals conservation, functional generation sequencing technologies. Bioinformatics Taylor&Francis,2002:1-30.
diversification, and novel developmental profiles. 2011;27:2031-7;PMID:21636596;http://dx.doi.org/ 52. Ghedin E, Wang S, Spiro D, Caler E, Zhao Q,
Genome Res 2011; 21:1462-77; PMID:21685128; 10.1093/bioinformatics/btr319 CrabtreeJ,etal.Draftgenomeofthefilarialnematode
http://dx.doi.org/10.1101/gr.121426.111 46. Martin JA, Wang Z. Next-generation transcriptome parasite Brugia malayi. Science 2007; 317:1756-60;
39. ChaissonMJ,BrinzaD,PevznerP.Denovofragment assembly. Nat Rev Genet 2011; 12:671-82; PMID: PMID:17885136; http://dx.doi.org/10.1126/science.
assemblywithshortmate-pairedreads:Doestheread 21897427;http://dx.doi.org/10.1038/nrg3068 1145406
lengthmatter?GenomeRes2009;19:336-46;PMID: 47. Korlach J, Bjornson K, Chaudhuri B, Cicero R, 53. Dieterich C, Clifton S, Schuster L, Chinwalla A,
19056694;http://dx.doi.org/10.1101/gr.079053.108 FlusbergB,GrayJ,etal.Real-timeDNAsequencing DelehauntyK,DinkelackerI,etal.ThePristionchus
40. FlicekP,BirneyE.Sensefromsequencereads:methods from single polymerase molecules. Methods Enzymol pacificus genome provides a unique perspective on
for alignment and assembly. Nat Methods 2009; 6: 2010; 472:431-55; PMID:20580975; http://dx.doi. nematode lifestyle and parasitism. Nat Genet 2008;
S6-12; PMID:19844229; http://dx.doi.org/10.1038/ org/10.1016/S0076-6879(10)72001-2 40:1193-8; PMID:18806794; http://dx.doi.org/10.
nmeth.1376 48. MagliaG,HeronA,StoddartD,JaprungD,BayleyH. 1038/ng.227
41. Pop M. Genome assembly reborn: recent computa- Analysisofsinglenucleicacidmoleculeswithprotein 54. van Megen H, van den Elsen S, Holterman M,
tional challenges. Brief Bioinform 2009; 10:354-66; nanopores. Methods Enzymol 2010; 475:591-623; Karssen G, Mooyman P, Bongers T, et al. A
PMID:19482960;http://dx.doi.org/10.1093/bib/bbp026 PMID:20627172; http://dx.doi.org/10.1016/S0076- phylogenetictreeofnematodesbasedonabout1200
42. AlkanC,SajjadianS,EichlerE.Limitationsofnext- 6879(10)75022-9 full-length small subunit ribosomal DNA sequences.
generation genome sequence assembly. Nat Methods 49. HammesfahrB,OdronitzF,HellkampM,KollmarM. Nematology 2009; 11:927-50; http://dx.doi.org/10.
2011;8:61-5;PMID:21102452;http://dx.doi.org/10. diArk 2.0 provides detailed analyses of the ever 1163/156854109X456862
1038/nmeth.1527 increasing eukaryotic genome sequencing data. BMC 55. Blaxter M. Nematodes: the worm and its relatives.
43. Picardi E, Pesole G. Computational methods for ab ResearchNotes2011;4:338;PMID:21906294;http:// PLoS Biol 2011; 9:e1001050; PMID:21526226;
initioandcomparativegenefinding.MethodsMolBiol dx.doi.org/10.1186/1756-0500-4-338 http://dx.doi.org/10.1371/journal.pbio.1001050
2010; 609:269-84; PMID:20221925; http://dx.doi. 50. Liolios K, Chen IM, Mavromatis K, Tavernarakis N,
org/10.1007/978-1-60327-241-4_16 HugenholtzP,MarkowitzV,etal.TheGenomesOnLine
44. Miller JR, Koren S, Sutton G. Assembly algorithms Database (GOLD) in 2009: status of genomic and
fornext-generationsequencingdata.Genomics2010; metagenomic projects and their associated metadata.
©95:315 -27;2PMID0:202112412; http:2//dx.doi.o rg/1L0. aNuncleicAciddsRes201e0;38:D3s46-54; PMIBD:199149i34;oscience.
1016/j.ygeno.2010.03.001 http://dx.doi.org/10.1093/nar/gkp848
Do not distribute.
50 Worm Volume1Issue1