Table Of ContentThe Nuovo soggettario as a service for
the linked data world
GiovanniBergamin,AnnaLucarelli
Introduction
TheNuovoSoggettario(hereinafter,NS)editedbytheNationalCen-
tralLibraryofFlorence(BNCF),isthemainItaliansubjectindexing
toolforvariouskindsofresources. Ithasbeendevelopedincollab-
orationwiththeItalianNationalBibliography(BNI)whichholds
aleadingroleinthebuldinganddevelopmentofsubjectindexing
tools in compliance with the International Federation of Library
Association(IFLA)recommendations(TheNSemploymentbyItal-
ianNationalBibliographyisalsodescribedinJahns,Guidelinesfor
subjectaccessinNationalbibliographies)andotherInternationalstan-
dards. ThistoolisusedbygeneralandspecializedItalianlibraries
(indexers, researchers, users), in particular those participating in
theServizioBibliotecarioNazionale(SBN),andisalsoemployable
inarchives,multimedialibrariesanddocumentationcentres. The
NSenteredintothetraditionoftheanalytico-syntheticlanguages;
the system consists of a semantic and syntactical apparatus and,
in compliance with the uniform and specific heading principles,
it is conceived as a system to be applied in both pre-coordinated
(the terms are combined in subject strings) and post-coordinated
indexingenvironments(thetermsareextractedfromacontrolled
JLIS.it.Vol.4,n.1(Gennaio/January2013).
DOI:10.4403/jlis.it-5474
G.Bergamin,TheNuovosoggettarioasaserviceforthelinkeddataworld
vocabularyandusedaskeywords). ThemaincomponentoftheNS
isauniversalthesaurusbuiltincompliancewiththeInternational
standards,availableonlinefromthe2007.1 Itisatoolcontinuously
beingdevelopedandcurrentlyaccessibleontheBNCFwebsite. At
themomenttheThesaurusconsistsof46,000termsderivedfromthe
1956Soggettarioanditsupdates(whicharebeingcontrolledand
standardized),fromnewtermsintroducedforthesemanticrelation-
shipnetworkandfromnewtermsproposedbytheBNIindexers
and other partners (Lucarelli et al., “The Nuovo soggettario The-
saurus:structuralfeaturesandwebapplicationprojects”).Theterms
areorganizedinsideastructurebasedonfourmaincategoriesand
onsemanticrelationshipsdeterminedbystandards(ISO2788:1986–
Documentation,guidelinesfortheestablishmentanddevelopmentofmono-
lingualthesauri. Documentation,principesdirecteurspourl’établissement
etledéveloppementdethesaurusmonolingue; ISO25964/1:2011–The-
sauriandinteroperabilitywithothervocabularies. Part1: Thesaurifor
informationretrieval). Theyareequippedbyarichapparatusofnotes,
connectionswithformerlypreferredterms(historicalvariants),an
indication of the correspondent numbers of the Dewey Decimal
Classification,aswellasbySourceswhichareinconstantupdating
andemployedforthecontrolofmorphologiesandmeanings.2 The
ThesaurusisintegratedwiththeBNCFopacandwiththeopacof
theotherlibrariesthatadoptit. Theuserscannavigatefromthecon-
trolledvocabularytothebibliographicrecords. RegardingLinked
data,theThesaurusislinkedwithotherthesauri,withsomeency-
clopedias(suchasWikipediaandtheprestigiousItalianTreccanien-
cyclopedia3),andwithotherculturalinstituition’sdigitalresources.
TheNSthesauruspromotestheItalianlanguageandmultilingual
informationretrievalbyitsdatamanagementsoftware,howeveris
1http://thes.bncf.firenze.sbn.it/ricerca.php.
2http://thes.bncf.firenze.sbn.it/fonti.php.
3http://www.treccani.it.
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.214
JLIS.it.Vol.4,n.1(Gennaio/January2013)
alsoincompliancewithstandards(GuidelinesforMultilingualThe-
sauri). Alargenumberoftermshasacross-languageequivalence
relationshipwithLibraryofCongressSubjectHeadings(LCSH)pre-
ferred terms, displayed and linked by “Equiv. LCSH” note. i.e.
«Costodellavita»:
Inthelastperiod,theNSisdevelopedintwoways:
1. Interoperability:since2010,metadataareavailableinResource
DescriptionFramework(RDF)/SKOSformatandwillbeem-
ployableintheLinkeddataworld,notonlyincloselylibrari-
anscontexts;
2. Automaticindexing: thesaurusistestinginautomaticindex-
ingofdigitalresources;inparticularourgoalistoreducethe
cataloguingexpenses.
These developments are outlined with the programs of other
countriesintheindexingdomain,suchasdemonstratedbyIFLA
papers(GömpelandSvensson,“Managinglegaldepositforonline
publicationsinGermany”).
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.215
G.Bergamin,TheNuovosoggettarioasaserviceforthelinkeddataworld
SKOS standard for thesauri
SimplifiedKnowledgeOrganisationSystem(SKOS)isdefinedasa
commondatamodel,4 developedbyW3CSemanticWebDeploy-
mentWorkingGroup(SWDWG),5 forsharingandlinkingknowl-
edgeorganizationsystems(suchasthesauri,taxonomies,classifi-
cationschemesandsubjectheadingsystems)withinthesemantic
web. ItisanapplicationoftheRDF. Themostimportantthesauri,
developed by National Libraries, are progressively adopting this
standardfortheircontrolledvocabularies. SKOSdataareconcepts
whichareindependentofthetermsusedtolabelthem,taggedas
RDFtriplesandencodedusinganyconcreteRDFsyntax. Thecon-
cepts,whichareexpressedbypreferredtermsinthethesaurusand
usedasdescriptorsinindexingsystem,areidentifiedwithURIsand
arelabeledwithskos:prefLabel,expressedinoneormorenaturallan-
guages. Thestandardassignsalternativelexicallabelstoconceptual
resourceswhichhavenotaURI:skos:altLabeltorepresentarela-
tionshipbetweentermsinathesaurusthatbothrepresentthesame
concept;skos:hiddenLabeltorepresentmisspelledvariantsofother
lexicallabels, abbreviationsandacronyms. Thestandardexpects
the possibility to define and qualify the concept with some other
informationexpressedbysomelabelswhichcamefromskos:note
superclass (skos:definition; skos:scopeNote; skos:example: gives
examplesfortheuseoftheterms;skos:historynote: itmaybeap-
pliedtoapreferredornon-preferredtermortoaconcept. Itshould
be used when a new preferred term is added to the thesaurus or
changeismadetoanexistingtermthataffectstheconcept’sscope
indifferentperiodsofapplication;skos:editorialnote: givessome
administrationinformation; skos:changenote: documents thedif-
ferent choices and modifications). The hierachical ad associative
4http://www.w3.org/TR/skos-reference.
5http://www.w3.org/2004/02/skos.
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.216
JLIS.it.Vol.4,n.1(Gennaio/January2013)
thesauralrelationship,establishedbetweenconcepts,arelabelled
withskos:broader,skos:narrower,skos:related.
NS in SKOS format
Our thesaurus has been converted in SKOS format at the begin-
ning of 2010. It was presented as a prototype at the IV Summit
di Architettura dell’informazione (Motta and Rodighiero, “Il the-
saurusdelNuovosoggettariointerpretaSKOS”)andthenimproved
withintheDigitalresourcesautomaticindexingproject,developed
intheBNCFsince2011(Viti,“Interoperabilitàfrathesaurigenerali
ethesaurispecialisticiinambitoeconomico-finanziario. Ilcasodel
Nuovosoggettario”).Ourworkhasfollowedmanystageandnowis
growinggraduallyincomparisonwithcurrentdevelopments. One
ofthemostimportantproblemsstartingwiththeprototypalstage
wasabouttheimpossibilitythatSKOS–evenifitdefinesanexpres-
sivearrayofsiblingtermsandcollectionsofconcepts–recognizes
nodelabelsasconceptualunitswhichbelongtohierarchicalrela-
tionships;thestandardcallsthemexclusivelyskos:Collection. The
applicationdoesn’testablishlinksbetweenthemembersofarrays
andthegeneralconceptwhichexpressedthesamearray. Instead
each member of the array (skos:member) is directly linked with
the concept which comes before the node label and not with the
arrayidentifiedbyskos:Collection. ThroughtheURI’sskos:Concept
wecouldverifyifaskos:Memberbelongstoaskos:Collectionand
rebuild the whole hierarchical relationships. For example, a di-
rectlinkcannotbeestablishedbetweentheskos:ConceptBambini,
skos:Collection[Bambinisecondol’attività]andskos:memberBam-
biniartisti. Duringourconversionwehavefoundotherproblems;
in particular, there where some difficulties for translation of two
typesofsemanticrelationships:
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.217
G.Bergamin,TheNuovosoggettarioasaserviceforthelinkeddataworld
1. historicalvariantsrelationship(expressedwithHSF,Historical
seefor)linkssomepreferredtermswithsomepreferredterms
inthepastwhicharenolongeraccepted;
2. themulti-wordtermssplittingrelationship(expressedwith
USE+/UF+)createreciprocallinkbetweenmulti-wordterms
andsinglewordtermsderivedfromfactoring.
Inthefirstcase,wehaverefinedthehistoricalvariantstagged
skos:altLabel class as sogi:obsoleteTerm. Practically, the the his-
torical variants begin a non preferred term. About the splitting
of the complex concepts, at the moment, we have decided not to
implementtheSKOSXLextension(whichidentifiesalsotheterms
byanURI,notonlytheconcepts),becauseaboutthiswehavenot
foundsomeexamplesofapplications. Atthemoment,thesplitting
relationshipisexpressedbyanoteinaspecificfield. Theappara-
tus of note (definition, scope note, history note, sources, DDC...)
is suitably expressed by SKOS. The syntactical note, that in the
thesaurusguidesthesubjectstringsconstructions,islabelledwith
skos:example. TheassignmentofanURItotheconceptspromote
the interoperabilty between different KOS, that is the possibility
ofmappingthesemanticentitiesofdifferentconceptualschemes.
Torealizethisaim,thestandardestablishesthreedifferentequiva-
lencelevels: skos:closeMatch;skos:exactMatch;skos:broaderMatch
eskos:narrowerMatch;skos:relatedMatch.6 Aboutthis,wearetest-
ingthecreationofequivalencestosupportthelinkeddatabetween
NS terminology and its equivalents in another vocabularies. We
have chosen an empiric approach, based on an international re-
connaissanceofothersSKOSapplications. Duringthecreationor
maintenanceoftheNSequivalencescanbeactivatedby:
6http://www.w3.org/TR/skos-reference.
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.218
JLIS.it.Vol.4,n.1(Gennaio/January2013)
1. enteringinaspecificfield(Source)thenameofthevocabulary
you want to cite: if the cited vocabulary is available SKOS,
SKOSrelationshipofNSwillbeenrichedwithskos:closeMatch.
IfthethecitedvocabularyisnotavailableinSKOSthiscitation
willbeusedforthecreationofadeeplinktothevocabulary
(i.e. adirectlinktothecorrespondingterm);
2. entering the equivalence in a specific field (Equiv. LCSH)
which refers to the Library of Congress Subject Headings
equivalences: alsointhiscaseweusecloseMatchrelationship
whichisconceptualwide-rangingthanexactMatchwhichwas
usedintheinitialstage.7
AGROVOC 1070
DBPEDIA 800
LCSH 750
ThESS 450
RAMEAU 240
EUROVOC 80
We are testing the settlement of equivalence semantic levels, be-
tween NS and ThESS (the thesaurus of Mario Rostoni Library of
theLIUCUniversity),byskos:broaderMatch,skos:narrowerMatch,
skos:relatedMatchtags.
7Aboutthis, wehaveanalysedmatchingproceduresbetweenRAMEAUand
LCSH,inwhichthelinkisanexactMatchoracloseMatchwithoutequivalencelevel’s
identification.Atthemoment,thelinksbetweenRAMEAUandLCSHareestablished
withacloseMatch(onesenserelationship:RAMEAU->LCSH)whilethosebetween
LCSHandRAMEAUareestablishedwithanexactMatchLCSH<>RAMEAU.
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.219
G.Bergamin,TheNuovosoggettarioasaserviceforthelinkeddataworld
The NS for automatic indexing of digital
resources
Asalreadymentioned,inBNCFhasbeenrunningsince2011a
prototypetestfortheuseofNSforsemiautomaticsubjectindexing
of digital resources acquired through legal deposit.8 The BNCF
initiativeisinlinewithotherEuropeannationallibrariesinitiative
(forinstance,theDeutscheNationalbibliotekprojectinthisfieldisa
relevantone(Junger,“Canindexingbeautomated? -theexample
of the Deutsche Nationalbibliothek”) and takes into account two
objectives:
1. the need for change in cataloguing practices due to rising
amountofpublicationsindigitalformat;
2. thesustainabilityofsubjectindexing.
Here“automaticindexing”referstoproceduresusingalgorithms
and techniques – coming alsoas result ofthe latest technological
research – that can be used for automatic (or semi-automatic) ex-
tractionfromatextof“relevant”keywords/keyphrases. These
procedures may be based on keywords / key phrases extraction
and assignment with or without support of a controlled vocabu-
lary. Accordingtorecenttestsinprogressattheinternationallevel,
automatic indexing seems to produce better results – in term of
precision and recall – if assisted by controlled lists (such as the-
sauri). In our prototype, the process of extraction of keywords
/ key phrases is managed by the software application Keyword
indexer(BibliotecaNazionaleCentralediFirenze,“Procedureau-
tomatizzatediestrazionediparoleefrasichiave: specifichetecnico-
funzionali”). This application requires, as preliminary step, the
8The prototype was developed in collaboration with two Italian companies:
Casalinilibrihttp://www.casalini.itand@Culthttp://www.atcult.it.
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.220
JLIS.it.Vol.4,n.1(Gennaio/January2013)
creationofaknowledgebase(alsocalledlearningmodel)basedon
sampledocuments(withassociatedmetadata)andavocabularyin
SKOS format. In particular, as a first test, we created a thematic
learning model on the economic and financial sectors, using the
followingstructuralcomponents:
1. setofdigitalfull-textdocuments: asampleofItaliandoctoral
thesisbelongingtotheeconomicandfinancialsectoraccording
totheclassificationsystemdeterminedbytheMIUR(Ministry
ofEducation,UniversityandResearch): theclassificationsym-
bolsareSECS-P/01-13andSECS-S/01-06;
2. setofmetadataassociatedwiththeselectedsetofdocuments;
3. NuovoSoggettario(NS)inSKOSformat;
This model has been then applied to indexing the 2010-2011 is-
suesofthedigitaljournalLIUCPapers.9KeywordIndexersoftware,
usingTF/IDF(TermFrequency/InverseDocumentFrequency)al-
gorithm,wasusedtodeterminetherankingofterms. Obviously
finalresultswereaffectedbyeveryvariationoftheaboveparame-
ters.10 Obviouslyfinalresultswereaffectedbyeveryvariationofthe
aboveparameters. Forthetimebeing,consideringthelastconfigu-
rationofourtest(choiceofmetadataclosesttothesemanticcontent
9ItalianmonthlyjournalfocusedonsocialscienceandinparticularonEconomics
andManagementhttp://www.biblio.liuc.it/pagineita.asp?codice=82. Itisedited
byMarioRostoniLibraryofCarloCattaneoUniversityinCastellanza(LIUC)which
cooperatewiththeNSproject.
10«TheTF/IDFweight(termfrequency–inversedocumentfrequency)isanumer-
icalstatisticwhichreflectshowimportantawordistoadocumentinacollection
orcorpus. Itisoftenusedasaweightingfactorininformationretrievalandtext
mining.TheTF/IDFvalueincreasesproportionallytothenumberoftimesaword
appearsinthedocument,butisoffsetbythefrequencyofthewordinthecorpus,
whichhelpstocontrolforthefactthatsomewordsaregenerallymorecommonthan
others».http://en.wikipedia.org/wiki/Tf*idf
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.221
G.Bergamin,TheNuovosoggettarioasaserviceforthelinkeddataworld
ofthedocumentsuchastitle+abstract, title+MIUR-Ministero
dell’Istruzione,dell’UniversitàedellaRicerca-classificationsym-
bol),findingsaretobeconsideredprovisional:inthiscaseautomatic
indexing isnot closestenough to intellectual indexing. For these
reasonsweplantocontinueourteststakingintoaccount:
1. amultidisciplinarylearningmodelforthegeneralneedsofa
Nationallibrary;
2. refinement of procedures for preparation of metadata to be
usedforbuildingthelearningmodel: weareconsideringboth
intellectual indexing and/or new automatic procedures for
extractingtopickeywordswhichcouldbeusedasmetadata.
Inanycaseisworthconsideringthatallthetestsarebasedonreuse
ofopensourcesoftwarecomponentsfreelyavailableonthenet.
NS and the Semantic web
For interoperability with other applications, NS is available
through the Zthes protocol.11Zthes is essentially an evolution of
Z39.50-basedinformationretrievalprotocol,wherethetargetsare
notlibrarycatalogsbutcontrolledvocabulariesincompliancewith
ISO2788andISO5964. ThroughZthes,applicationscanexchange
datausingthewell-knownandestablishedmechanismofapplica-
tioninterfacesknownasApplicationProgrammingInterface(API)s.
In particular Zthes uses SRU syntax (Search-Retrieval via URL)
whererequestsforaccesstoacontrolledvocabularyareincluded
asaparameterswithinaURLandresponsemessagesaretagged
usingXMLsyntax: inotherwords, Zthesuseshttpprotocol-de-
signedforinteractionbetweentheuser(browser)andmachine(web
11http://zthes.z3950.org.
JLIS.it. Vol.4,n.1(Gennaio/January2013).Art.#5474 p.222