Table Of ContentPalgrave Advances in Language and Linguistics
Series Editor:Christopher N. Candlin, Macquarie University, Australia
Palgrave Advances in Language and Linguistics is an international book series which
focuses on subjects that are of current critical importance within Linguistics. Titles in
this series map the territory and bring readers’ attention to some of the most salient
and rewarding work on the topic from active and forward-looking researchers. This
series is designed for postgraduate students, upper-level undergraduates considering
taking further studies and experienced researchers and practitioners keen to explore
topics with which they may not be so familiar.
Titles include:
Charles Antaki (editorr)
APPLIED CONVERSATION ANALYSIS
Paul Baker and Tony McEnery (editors)
CORPORA AND DISCOURSE STUDIES
Integrating Discourse and Corpora
Mike Baynham and Mastin Prinsloo (editors)
THE FUTURE OF LITERACY STUDIES
Noel Burton-Roberts (editorr)
PRAGMATICS
Susan Foster-Cohen (editorr)
LANGUAGE ACQUISITION
Monica Heller (editorr)
BILINGUALISM: A SOCIAL APPROACH
Juliane House (editorr)
TRANSLATION: A MULTIDISCIPLINARY APPROACH
Barry O’Sullivan (editorr)
LANGUAGE TESTING: THEORIES AND PRACTICES
Martha E. Pennington (editorr)
PHONOLOGY IN CONTEXT
Mastin Prinsloo and Christopher Stroud (editors)
EDUCATING FOR LANGUAGE AND LITERACY DIVERSITY
Steven Ross and Gabriele Kasper (editors)
ASSESSING SECOND LANGUAGE PRAGMATICS
Julia Snell, Sara Shaw and Fiona Copland (editors)
LINGUISTIC ETHNOGRAPHY
Ann Weatherall, Bernadette M. Watson and Cindy Gallois (editors)
LANGUAGE, DISCOURSE AND SOCIAL PSYCHOLOGY
Palgrave Advances in Language and Linguistics
Series Standing Order ISBN 978–1–137–02986–7 hardcover
978–1–137–02987–4 paperback
(outside North America only)
You can receive future titles in this series as they are published by placing a standing
order. Please contact your bookseller or, in case of diffifi culty, write to us at the address
below with your name and address, the title of the series and the ISBN quoted above.
Customer Services Department, Macmillan Distribution Ltd, Houndmills, Basingstoke,
Hampshire RG21 6XS, England
Also by Paul Baker
USING CORPORA TO ANALYSE GENDER
DISCOURSE ANALYSIS AND MEDIA ATTITUDES
The Representation of Islam in the British Press(co-authoreddd)
KEY TERMS IN DISCOURSE ANALYSIS(co-authoreddd)
SOCIOLINGUISTICS AND CORPUS LINGUISTICS
CONTEMPORARY CORPUS LINGUISTICS(editeddd)
SEXED TEXTS
Language, Gender and Sexuality
USING CORPORA IN DISCOURSE ANALYSIS
A GLOSSARY OF CORPUS LINGUISTICS(co-authoreddd)
PUBLIC DISCOURSES OF GAY MEN
HELLO SAILOR! SEAFARING LIFE FOR GAY MEN: 1945–1990(co-authoreddd)
FANTABULOSA: A DICTIONARY OF POLARI AND GAY SLANG
POLARI: THE LOST LANGUAGE OF GAY MEN
Also by Tony McEnery
DISCOURSE ANALYSIS AND MEDIA ATTITUDES
The Representation of Islam in the British Press
A GLOSSARY OF CORPUS LINGUISTICS(co-authoreddd)
CORPUS LINGUISTICS
Method, Theory and Practice(co-authoreddd)
CORPUS BASED LANGUAGE STUDIES OF ENGLISH AND CHINESE(co-authoreddd)
CORPUS-BASED LANGUAGE STUDIES
An Advanced Resource Book(co-authoreddd)
ASPECT IN CHINESE(co-authoreddd)
SWEARING IN ENGLISH
Bad Language, Purity and Power from 1586 to the Present
A FREQUENCY DICTIONARY OF POLISH(co-authoreddd)
CORPUS LINGUISTICS(2e,co-authoreddd)
CORPUS LINGUISTICS(co-authoreddd)
COMPUTATIONAL LINGUISTICS
A Natural Language Processing Toolbox and Guide
Corpora and Discourse
Studies
Integrating Discourse and Corpora
Edited by
Paul Baker and Tony McEnery
Lancaster University, UK
Selection, introduction and editorial content © Paul Baker and Tony McEnery 2015
Individual chapters © Respective authors 2015
Softcover reprint of the hardcover 1st edition 2015 978-1-137-43172-1
All rights reserved. No reproduction, copy or transmission of this publication may be
made without written permission.
No portion of this publication may be reproduced, copied or transmitted
save with written permission or in accordance with the provisions of the
Copyright, Designs and Patents Act 1988, or under the terms of any licence
permitting limited copying issued by the Copyright Licensing Agency,
Saffron House, 6–10 Kirby Street, London EC1N 8TS.
Any person who does any unauthorized act in relation to this publication
may be liable to criminal prosecution and civil claims for damages.
The authors have asserted their rights to be identifi ed as the authors of this work in
accordance with the Copyright, Designs and Patents Act 1988.
First published 2015 by
PALGRAVE MACMILLAN
Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited,
registered in England, company number 785998, of Houndmills, Basingstoke,
Hampshire RG21 6XS.
Palgrave Macmillan in the US is a division of St Martin’s Press LLC,
175 Fifth Avenue, New York, NY 10010.
Palgrave Macmillan is the global academic imprint of the above companies and has
companies and representatives throughout the world.
Palgrave® and Macmillan® are registered trademarks in the United States,
the United Kingdom, Europe and other countries.
ISBN 978-1-349-55729-5 ISBN 978-1-137-43173-8 (eBook)
DOI 10.1057/9781137431738
This book is printed on paper suitable for recycling and made from fully
managed and sustained forest sources. Logging, pulping and manufacturing
processes are expected to conform to the environmental regulations of the
country of origin.
A catalogue record for this book is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Corpora and discourse studies : integrating discourse and corpora / edited by Paul Baker,
Lancaster University, UK and Tony McEnery, University of Lancaster, UK.
pages cm
Summary: “The growing availability of large collections of language texts has expanded
our horizons for language analysis, enabling the swift analysis of millions of words of
data, aided by computational methods. This edited collection contains examples of such
contemporary research which uses corpus linguistics to carry out discourse analysis. The
book takes an inclusive view of the meaning of discourse, covering different text-types
or modes of language, including discourse as both social practice and as ideology or
representation. Authors examine a range of spoken, written, multimodal and electronic
corpora covering themes which include health, academic writing, social class, ethnicity,
gender, television narrative, news, Early Modern English and political speech. The chapters
showcase the variety of qualitative and quantitative tools and methods that this new
generation of discourse analysts are combining together, offering a set of compelling
models for future corpus-based research in discourse”— Provided by publisher.
1. Discourse analysis. 2. Corpora (Linguistics) I. Baker, Paul, 1972- editor.
II. McEnery, Tony, 1964- editor.
P302.C66 2015
401'.41—dc23 2015012348
Typeset by MPS Limited, Chennai, India.
Contents
List of Figures and Tables vii
Series Editor’s Preface xi
Notes on Contributors xii
1 Introduction 1
Paul Baker and Tony McEnery
2 e-Language: Communication in the Digital Age 20
Dawn Knight
3 Beyond Modal Spoken Corpora: A Dynamic Approach to
Tracking Language in Context 41
Svenja Adolphs, Dawn Knight and Ronald Carter
4 Corpus-Assisted Multimodal Discourse Analysis of Television
and Film Narratives 63
Monika Bednarek
5 Analysing Discourse Markers in Spoken Corpora:Actually as a
Case Study 88
Karin Aijmer
6 Discursive Constructions of the Environment in American
Presidential Speeches 1960–2013: A Diachronic Corpus-Assisted
Study 110
Cinzia Bevitori
7 Health Communication and Corpus Linguistics: Using
Corpus Tools to Analyse Eating Disorder Discourse Online 134
Daniel Hunt and Kevin Harvey
8 Multi-Dimensional Analysis of Academic Discourse 155
Jack A. Hardy
9 Thinking about the News: Thought Presentation in Early
Modern English News Writing 175
Brian Walker and Dan McIntyre
10 The Use of Corpus Analysis in a Multi-Perspectival Study
of Creative Practice 192
Darryl Hocking
11 Corpus-Assisted Comparative Case Studies of Representations
of the Arab World 220
Alan Partington
v
vi Contents
12 Who Benefifi ts When Discourse Gets Democratised? Analysing
a Twitter Corpus around the British Benefifits Streett Debate 244
Paul Baker and Tony McEnery
13 Representations of Gender and Agency in the Harry
Potterr Series 266
Sally Hunt
14 Filtering the Flood: Semantic Tagging as a Method of
Identifying Salient Discourse Topics in a Large Corpus
of Hurricane Katrina Reportage 285
Amanda Potts
Index 305
List of Figures and Tables
Figures
2.1 Log-likelihood comparisons of core modal verb forms across
the different data-types in CANELC and the BNC 29
2.2 Relative frequencies of core modal verb use in the spoken
and written BNC 31
2.3 Relative frequencies of core modal verb use in CANELC 32
2.4 Sample concordance output illustrating the use of can in
the SMS sub-corpus 35
2.5 Sample concordance output illustrating the use of shall
in the SMS sub-corpus 35
3.1 Art galleries involved in the British Art Show 7 46
3.2 The Fieldwork Tracker application 49
3.3 Uploading the Fieldwork Tracker logs into DRS 52
3.4 Filtering data by location 53
3.5 Sample concordance output of like in the ‘inside’ sub-corpus 59
4.1 Subtitles for Enlightened, season 1, episode 1 70
4.2 Script from Mad Men, season 1, episode 1 71
4.3 Positive and negative keywords in NJ-D 74
4.4 Concordances for joined 75
4.5 Concordances for his 75
4.6 Fan transcript (Nurse Jackie) 76
6.1 Relative frequency (per 1,000 tokens) of environmenttt*
over time and across administrations (1960–2013) 115
6.2 Proportion of instances related to ‘environment’
(lighter colour) vs. ‘other’ (darker colour) (r.f. 1,000 tokens)
of protect*,preserv** andconserv** in the PS corpus 118
6.3 Relative frequency (per 1,000 tokens) of environment* and
energyy over time and across administrations (1960–2013) 121
6.4 Relative frequency of clean* (per 1,000 tokens) across
presidents (1960–2013) 124
vii
viii List of Figures and Tables
8.1 Comparison of dimension scores for student levels in
Dimension 1: (+) involved, academic narrative vs. (–)
descriptive, informational discourse 163
8.2 Comparison of dimension scores for student levels in
Dimension 2: (+) expression of opinions and mental processes 166
8.3 Comparison of dimension scores for student levels
in Dimension 3: (+) situation-dependent, non-procedural
evaluation vs. (–) procedural discourse 169
8.4 Comparison of dimension scores for student levels in
Dimension 4: (+) production of possibility 170
9.1 A comparison of the percentage composition of DP in PDE
and EModE news 185
9.2 Comparison between the percentages of individual thought
presentation categories in PDE and EModE news 185
10.1 An NVivo-generated model (NVivo qualitative data analysis
software; QSR International Pty Ltd. Version 9, 2010) of the
thematic coding of the wordsidea and ideas found in the
student brief corpus and ethnographic data (participant
interviews and interactions) 200
12.1 Collocational network of dee 252
12.2 Collocational network of fags 252
12.3 Collocational network of bankers 258
Tables
2.1 Common modal forms in English (based on the CEC –
Cambridge English Corpus) 23
2.2 Topics covered in CANELC 27
2.3 The frequency of core modal verb usage in CANELC and
the BNC 28
2.4 LL comparisons of modal verbs in the email and SMS data
compared to the other data-types in CANELC 33
2.5 LL comparisons of forms of modal verb use in the Twitter
and blog data compared to the other data-types in CANELC 33
3.1 Participants recorded for the BAS study 48
3.2 Some transcription conventions used in the BAS data 51
3.3 Word counts for the ‘inside’ and ‘outside’ 54
3.4 Raw and relative frequencies of deictic markers in the
BAS corpora 57
List of Figures and Tables ix
3.5 The most common words used in the ‘inside’ vs. ‘outside’
sub-corpus and the ‘outside’ vs. ‘inside’ sub-corpus 58
3.6 The most common words used in the BAS corpus compared
to a spoken component of the BNC 59
4.1 Multimodality in fifilms and TV series 66
4.2 Multimodal transcript 77
5.1 The frequency of actually in four ICE-corpora 93
5.2 The distribution of actually in different positions in four
ICE-corpora 94
5.3 The ranking of actually in four ICE-corpora according to the
frequency of their position in the utterance 94
5.4 The function of actually in the right periphery in three
varieties of English 102
5.5 The function of actually in the left periphery in four
ICE-varieties 105
6.1 Breakdown of the lemmaprotect* in the PS corpus 117
6.2 Breakdown of top 12 collocates of energyy (5L-5R word span)
across presidents 122
6.3 Breakdown of the presidential speeches corpus (1960–2013) 127
7.1 Top 20 keywords inTeenage Health Freakcorpus relating
to the theme of weight and eating 140
7.2 Top 20 keywords inanorexia.nett corpus 141
7.3 Grammatical and lexical collocates of anorexic in order of
p-value (log-likelihood) 142
8.1 Distribution of papers across academic divisions and
disciplines 159
8.2 Composition of the features of Dimensions 1–4 160
8.3 Dimension 1 loadings (means) according to student level
and discipline 164
8.4 Dimension 2 loadings (means) according to student level
and discipline 167
8.5 Selection of Dimension 3 loadings according to student level
and discipline 169
9.1 Speech, writing and thought presentation model based on the
description in Short (2007) 177
9.2 Constituents of the fifi elds of thecatt attribute 181
10.1 The data collected and methodological focus for each of the
perspectives 196