Table Of ContentBookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
Bioinformatics
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
David Edwards Jason Stajich David Hansen
● ●
Editors
Bioinformatics
Tools and Applications
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
Editors
David Edwards David Hansen
Australian Centre for Plant Functional Genomics Australian E-Health Research Centre
Institute for Molecular Biosciences CSIRO
and School of Land Qld 4027, Brisbane, Australia
Crop and Food Sciences
University of Queensland
Brisbane, QLD 4072
Australia
Jason Stajich
Department of Plant Pathology
and Microbiology
University of California
Berkeley, CA
USA
ISBN 978-0-387-92737-4 e-ISBN 978-0-387-92738-1
DOI 10.1007/978-0-387-92738-1
Springer New York Dordrecht Heidelberg London
Library of Congress Control Number: 2009927717
© Springer Science+Business Media, LLC 2009
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY
10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection
with any form of information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject
to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
Preface
Biology has progressed tremendously in the last decade due in part to the
increased automation in the generation of data from sequences to genotypes to
phenotypes. Biology is now very much an information science, and bioinformatics
provides the means to connect biological data to hypotheses. Within this volume,
we have collated chapters describing various areas of applied bioinformatics,
from the analysis of sequence, literature, and functional data to the function and
evolution of organisms. The ability to process and interpret large volumes of
data is essential with the application of new high throughput DNA sequencers
providing an overload of sequence data. Initial chapters provide an introduction
to the analysis of DNA and protein sequences, from motif detection to gene
prediction and annotation, with specific chapters on DNA and protein databases as
well as data visualization. Additional chapters focus on gene expression analysis
from the perspective of traditional microarrays and more recent sequence-based
approaches, followed by an introduction to the evolving field of phenomics, with
specific chapters detailing advances in plant and microbial phenome analysis and
a chapter dealing with the important issue of standards for functional genomics.
Further chapters present the area of literature databases and associated mining
tools which are becoming increasingly essential to interpret the vast volume of
published biological information, while the final chapters present bioinformatics
purely from a developer’s point of view, describing the various data and databases
as well as common programming languages used for bioinformatics applications.
These chapters provide an introduction and motivation to further avenues for
implementation. Together, this volume aims to provide a resource for biology
students wanting a greater understanding of the encroaching area of bioinformatics,
as well as computer scientists who are interested learning more about the field of
applied bioinformatics.
Brisbane, QLD David Edwards
Berkeley, CA Jason E. Stajich
Brisbane, QLD David Hansen
v
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
Contents
1 DNA Sequence Databases ....................................................................... 1
David Edwards, David Hansen, and Jason E. Stajich
2 Sequence Comparison Tools .................................................................. 13
Michael Imelfort
3 Genome Browsers ................................................................................... 39
Sheldon McKay and Scott Cain
4 Predicting Non-coding RNA Transcripts .............................................. 65
Laura A. Kavanaugh and Uwe Ohler
5 Gene Prediction Methods ....................................................................... 99
William H. Majoros, Ian Korf, and Uwe Ohler
6 Gene Annotation Methods ..................................................................... 121
Laurens Wilming and Jennifer Harrow
7 Regulatory Motif Analysis ..................................................................... 137
Alan Moses and Saurabh Sinha
8 Molecular Marker Discovery and Genetic Map Visualisation ........... 165
Chris Duran, David Edwards, and Jacqueline Batley
9 Sequence Based Gene Expression Analysis .......................................... 191
Lakshmi K. Matukumalli and Steven G. Schroeder
10 Protein Sequence Databases ................................................................... 209
Terry Clark
vii
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
viii Contents
11 Protein Structure Prediction .................................................................. 225
Sitao Wu and Yang Zhang
12 Classification of Information About Proteins ....................................... 243
Amandeep S. Sidhu, Matthew I. Bellgard, and Tharam S. Dillon
13 High-Throughput Plant Phenotyping – Data Acquisition,
Transformation, and Analysis ................................................................ 259
Matthias Eberius and José Lima-Guerra
14 Phenome Analysis of Microorganisms .................................................. 279
Christopher M. Gowen and Stephen S. Fong
15 Standards for Functional Genomics ...................................................... 293
Stephen A. Chervitz, Helen Parkinson, Jennifer M. Fostel,
Helen C. Causton, Susanna-Assunta Sanson, Eric W. Deutsch,
Dawn Field, Chris F. Taylor, Philippe Rocca-Serra, Joe White,
and Christian J. Stoeckert
16 Literature Databases............................................................................... 331
J. Lynn Fink
17 Advanced Literature-Mining Tools ....................................................... 347
Pierre Zweigenbaum and Dina Demner-Fushman
18 Data and Databases ................................................................................. 381
Daniel Damian
19 Programming Languages ....................................................................... 403
John Boyle
Index ................................................................................................................. 441
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
Contributors
Jacqueline Batley Australian Centre for Plant Functional Genomics,
Centre of Excellence for Integrative Legume Research, School of Land,
Crop and Food Sciences, University of Queensland, Brisbane,
QLD 4072, Australia
[email protected]
John Boyle The Institute for Systems Biology, 1441 North 34th Street,
Seattle, WA 98105, USA
[email protected]
Matthew Belgard Centre for Comparative Genomics, Murdoch University,
Perth, WA, Australia
[email protected]
Scott Cain Ontario Institute for Cancer Research, 101 College Street,
Suite 800, Toronto, ON, Canada M5G0A3
[email protected]
Helen C. Causton MRC Clinical Sciences Centre, Imperial College London,
Hammersmith Hospital Campus, Du Cane Road, London W12 0NN, UK
[email protected]
Stephen A. Chervitz Affymetrix Inc., Santa Clara, CA 95051, USA
[email protected]
Terry Clark Australian Centre for Plant Functional Genomics,
Institute for Molecular Biosciences and School of Land,
Crop and Food Sciences, University of Queensland, Brisbane,
QLD 4072, Australia
[email protected]
Daniel Damian Biowisdom Ltd., CB 22 7GG, Cambridge, UK
[email protected]
Dina Demner-Fushman Communications Engineering Branch,
Lister Hill National Center for Biomedical Communications,
US National Library of Medicine, Bethesda, MD, USA
[email protected]
ix
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009 BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
x Contributors
Eric W. Deutsch The Institute for Systems Biology, Seattle, WA 98105, USA
[email protected]
Tharram Dillon Digital Ecosystems and Business Intelligence Institute,
Curtin University of Technology, Perth, WA, Australia
[email protected]
Chris Duran Australian Centre for Plant Functional Genomics,
School of Land, Crop and Food Sciences,
University of Queensland, Brisbane, QLD 4072, Australia
[email protected]
Matthias Eberius LemnaTec GmbH, Schumanstr. 1a,
52146 Wuerselen, Germany
[email protected]
David Edwards Australian Centre for Plant Functional Genomics,
Institute for Molecular Biosciences and School of land, Crop and Food Sciences,
University of Queensland, Brisbane, QLD 4072, Australia
[email protected]
Dawn Field Natural Environmental Research Council,
Centre for Ecology and Hydrology, Oxford, OX1 3SR, UK
[email protected]
J. Lynn Fink Skaggs School of Pharmacy and Pharmaceutical Sciences,
University of California, San Diego, CA, USA
[email protected]
Stephen S. Fong Department of Chemical and Life Science Engineering,
Virginia Commonwealth University, P.O. Box 843028, Richmond, VA 23284, USA
[email protected]
Jennifer M. Fostel Division of Intramural Research,
National Institute of Environmental Health Sciences,
Research Triangle Park, NC 27709, USA
[email protected]
Christopher M. Gowen Department of Chemical and Life Science Engineering,
Virginia Commonwealth University, P.O. Box 843028, Richmond,
VA 23284, USA
[email protected]
David Hansen Australian E-Health Research Centre,
CSIRO QLD 4027, Brisbane, Australia
[email protected]
Jennifer Harrow Wellcome Trust Sanger Institute, Morgan Building,
Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1HH, UK
[email protected]
BookID <BID>_ChapID <CID>_Proof# 1 - 29/08/2009
Contributors xi
Michael Imelfort Australian Centre for Plant Functional Genomics,
Institute for Molecular Biosciences and School of Land,
Crop and Food Sciences, University of Queensland,
Brisbane, QLD 4072, Australia
[email protected]
Laura A. Kavanaugh Department of Molecular Genetics and Microbiology,
Duke University, Durham, NC 27710, USA
[email protected]
Ian Korf UC Davis Genome Center, University of California, Davis,
451 Health Sciences Drive, Davis, CA 95616, USA
[email protected]
José Lima-Guerra Keygene N.V., Agrobusiness Park 90,
6708 PW Wageningen, The Netherlands
[email protected]
William H. Majoros Institute for Genome Sciences & Policy,
Duke University, Durham, NC 27708, USA
[email protected]
Lakshmi K. Matukumalli Department of Bioinformatics and Computational
Biology, George Mason University, Manassas, VA 20110, USA
[email protected]
Sheldon McKay Cold Spring Harbor Laboratory, 1 Bungtown Road,
Cold Spring Harbor, NY 11724, USA
[email protected]
Alan Moses Department of Cell & Systems Biology, University of Toronto,
25 Willcocks Street, Toronto, ON, Canada M5S 3B2
[email protected]
Uwe Ohler Department of Biostatistics & Bioinformatics, Institute for Genome
Sciences & Policy, Duke University, Durham, NC 27708, USA
[email protected]
Helen Parkinson European Bioinformatics Institute, Wellcome Trust
Genome Campus, Hinxton, Cambridge, UK
[email protected]
Philippe Rocca-Serra European Bioinformatics Institute, Wellcome Trust
Genome Campus, Hinxton, Cambridge, UK
[email protected]
Susanna-Assunta Sansone European Bioinformatics Institute,
Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
[email protected]
Description:Biology has progressed tremendously in the last decade due in part to the increased automation in the generation of data from sequences to genotypes to phenotypes. Biology is now very much an information science and bioinformatics provides the means to connect biological data to hypotheses. Bioinfor