Connecting vocabularies for data integration and cross search

Research output: Contribution to conferencePaper


The ARIADNE project [1] involves multiple data providers having multiple data sets, indexed using multiple controlled vocabularies in multiple languages. The project does not aim to replace existing archaeological data repositories, but rather to consolidate their metadata in order to facilitate cross search. There is therefore a need for some form of common ground to enable effective data integration; however it would clearly be unreasonable to expect to achieve domain-wide agreement on adoption of any single database schema or controlled terminology.

In the course of previous Knowledge Organization projects undertaken by University of South Wales (STAR/STELLAR/SENESCHAL) [2] we have addressed issues around the conversion of archaeological datasets and their associated controlled vocabulary resources to semantic web formats, making them available online for searching and browsing. One key issue that emerged was the fragmented nature of the various controlled vocabularies in use. In an ideal world conceptual knowledge about the archaeology domain would not be split and duplicated across modern territorial / political / organizational boundaries, but in the real world this is inevitably often the case. Local vocabularies are created for the purposes of controlled indexing of local resources to aid subsequent retrieval, but perhaps having less regard to any issues of wider integration – so no formal semantic links exist between these vocabularies. Any project seeking to implement such wider integration must tackle this issue to achieve data interoperability.

The creation of mappings between local vocabularies can provide a mediating platform to enable cross search; however the number of possible direct links between equivalent items originating from different vocabularies can quickly become unmanageable as the number of vocabularies increases. A more efficient and scalable approach would be the adoption of an intermediate structure onto which concepts from each local vocabulary may be mapped. A search on a concept originating from any one vocabulary can then utilise this mediating mechanism to route through to concepts originating from any of the other vocabularies, possibly expressed in multiple languages.

An exploratory exercise for the ARIADNE project was undertaken to demonstrate the utility of this approach. The poly-hierarchical structure of the Getty Art & Architecture Thesaurus (AAT) [3] was extracted for use as an example mediating structure to interconnect various multilingual vocabularies originating from ARIADNE data providers. Vocabulary resources were first converted to a common concept-based format (SKOS) [4] and the concepts were then manually mapped to nodes of the extracted AAT structure using some judgement on the meaning of terms and scope notes. The overall composite structure could then be queried.

The exercise demonstrated an effective method by which cross search could be achieved over multiple multilingual vocabularies using such a mediating structure, with a significant added benefit of introducing semantic expansion to potentially improve recall without affecting precision. The search results obtained can then be used to find related archaeological resource records from throughout the ARIADNE data. In the next stages larger scale mappings will be produced, to be published in the ARIADNE repository.


[1]ARIADNE FP7 project []

[2]University of South Wales - Knowledge Organization projects []

[3]Getty Art & Architecture Thesaurus (AAT) []

[4]Simple Knowledge Organization System (SKOS) []

Original languageEnglish
Publication statusPublished - 2015
EventCAA 2015: 43rd Computer Applications and Quantitative Methods in Archaeology Annual Conference: Keep the Revolution Going - Universita di Siena, Siena, Italy
Duration: 30 Mar 20153 Apr 2015


ConferenceCAA 2015: 43rd Computer Applications and Quantitative Methods in Archaeology Annual Conference
Abbreviated titleCAA2015
Internet address


Dive into the research topics of 'Connecting vocabularies for data integration and cross search'. Together they form a unique fingerprint.

Cite this