Table Of ContentNamed Entities for Computational Linguistics
FOCUS SERIES
Patrick Paroubek
Named Entities for
Computational Linguistics
Damien Nouvel
Maud Ehrmann
Sophie Rosset
First published 2016 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers,
or in the case of reprographic reproduction in accordance with the terms and licenses issued by the
CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the
undermentioned address:
ISTE Ltd John Wiley & Sons, Inc.
27-37 St George’s Road 111 River Street
London SW19 4EU Hoboken, NJ 07030
UK USA
www.iste.co.uk www.wiley.com
© ISTE Ltd 2016
The rights of Damien Nouvel, Maud Ehrmann and Sophie Rosset to be identified as the authors of this
work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2015959094
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISSN 2051-2481 (Print)
ISSN 2051-249X (Online)
ISBN 978-1-84821-838-3
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
Chapter 1. Named Entities for Accessing Information . . . . . 1
1.1.Researchprogramhistory . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1.Understandingdocuments: anambitioustask . . . . . . . . 2
1.1.2.Detectingbasicelements: namedentities . . . . . . . . . . . 3
1.1.3.Trend: areturntoslotfilling . . . . . . . . . . . . . . . . . . 7
1.2.Taskusingnamedentitiesasabasicrepresentation . . . . . . . 9
1.3.Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 2. Named Entities, Referential Units . . . . . . . . . . . 11
2.1.Issueswiththenamedentityconcept . . . . . . . . . . . . . . . 12
2.1.1.Aheterogeneousset . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2.Existingdefiningformulas . . . . . . . . . . . . . . . . . . . 17
2.1.3.AnNLPobject . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.Thenotionsofmeaningandreference . . . . . . . . . . . . . . . 22
2.2.1.Whatisthereference?. . . . . . . . . . . . . . . . . . . . . . 22
2.2.2.Whatismeaning? . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.Propernames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.1.Thetraditionalcriteriafordefiningapropername . . . . . . 28
2.3.2.Meaningandreferentialfunctionofpropernames . . . . . . 30
2.3.3.The“referentialload”ofpropernames . . . . . . . . . . . . 34
2.4.Definitedescriptions . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.4.1.Whatisadefinitedescription? . . . . . . . . . . . . . . . . . 35
2.4.2.Themeaningofdefinitedescriptions . . . . . . . . . . . . . 38
vi NamedEntitiesforComputationalLinguistics
2.4.3.Completeandincompletedefinitedescriptions . . . . . . . . 39
2.5.Themeaningandreferentialfunctioning
ofnamedentities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.1.Referencetoaparticular . . . . . . . . . . . . . . . . . . . . 42
2.5.2.Referentialautonomy . . . . . . . . . . . . . . . . . . . . . . 44
2.5.3.A“natural”heterogeneity . . . . . . . . . . . . . . . . . . . . 45
2.6.Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Chapter 3. Resources Associated with Named Entities . . . . 47
3.1.Typologies: generalandspecialistdomains . . . . . . . . . . . . 48
3.1.1.Thenotionofcategory . . . . . . . . . . . . . . . . . . . . . 48
3.1.2.Typologydevelopment . . . . . . . . . . . . . . . . . . . . . 49
3.1.3.Typologiesbeyondevaluationcampaigns . . . . . . . . . . . 53
3.1.4.Otherusesoftypologies . . . . . . . . . . . . . . . . . . . . 54
3.1.5.Illustratedcomparison. . . . . . . . . . . . . . . . . . . . . . 57
3.1.6.Issuestoconsiderregardingentities . . . . . . . . . . . . . . 57
3.2.Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2.1.Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2.2.Corporaandnamedentities . . . . . . . . . . . . . . . . . . . 60
3.2.3.Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.Lexiconsandknowledgedatabases . . . . . . . . . . . . . . . . 65
3.3.1.Lexicaldatabases . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3.2.Knowledgedatabases . . . . . . . . . . . . . . . . . . . . . . 72
3.4.Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Chapter 4. Recognizing Named Entities . . . . . . . . . . . . . . 77
4.1.Detectionandclassificationofnamedentities . . . . . . . . . . 78
4.2.Indicatorsfornamedentityrecognition . . . . . . . . . . . . . . 79
4.2.1.Describingwordmorphology . . . . . . . . . . . . . . . . . 79
4.2.2.Usinglexicaldatabases . . . . . . . . . . . . . . . . . . . . . 81
4.2.3.Contextualclues . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2.4.Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3.Rule-basedtechniques. . . . . . . . . . . . . . . . . . . . . . . . 85
4.4.Data-drivenandmachine-learningsystems . . . . . . . . . . . . 88
4.4.1.Majorityclassmodels . . . . . . . . . . . . . . . . . . . . . . 91
4.4.2.Contextualmodels(HMM) . . . . . . . . . . . . . . . . . . . 92
4.4.3.Multiplefeaturemodels(SoftmaxandMaxEnt) . . . . . . . 93
Contents vii
4.4.4.ConditionalRandomFields(CRFs) . . . . . . . . . . . . . . 95
4.5.Unsupervisedenrichmentofsupervisedmethods. . . . . . . . . 95
4.6.Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Chapter 5. Linking Named Entities to References . . . . . . . . 99
5.1.Knowledgebases . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2.Formalizingpolysemyinnamedentitymentions . . . . . . . . . 102
5.3.Stagesinthenamedentitylinkingprocess . . . . . . . . . . . . 103
5.3.1.Detectingmentionsofnamedentities . . . . . . . . . . . . . 103
5.3.2.Selectingcandidatesforeachmention . . . . . . . . . . . . . 103
5.3.3.Entitydisambiguation . . . . . . . . . . . . . . . . . . . . . . 104
5.3.4.Entitylinking. . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.Systemperformance . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.4.1.Practicalapplication: DBpediaSpotlight . . . . . . . . . . . 107
5.4.2.Futureprospects . . . . . . . . . . . . . . . . . . . . . . . . . 108
Chapter 6. Evaluating Named Entity Recognition . . . . . . . . 111
6.1.Classicmeasurements: precision,recallandF-measures . . . . 112
6.2.Measuresusingerrorcounts . . . . . . . . . . . . . . . . . . . . 115
6.3.Evaluatingassociatedtasks . . . . . . . . . . . . . . . . . . . . . 120
6.3.1.Detectingentitiesandmentions . . . . . . . . . . . . . . . . 121
6.3.2.Entitydetectionandlinking . . . . . . . . . . . . . . . . . . 122
6.4.Evaluatingpreprocessingtechnologies . . . . . . . . . . . . . . 126
6.5.Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Appendix 1. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Appendix 2. Named Entities: Research Programs . . . . . . . . 141
Appendix 3. Summary of Available Corpora . . . . . . . . . . . . 147
Appendix 4. Annotation Formats . . . . . . . . . . . . . . . . . . . 151
Appendix 5. Named Entities: Current Definitions . . . . . . . . 153
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169