image
english       hämta      
HEM   TJÄNSTER   TEXTER   RESURSER   SAMARBETE  

Paper
CIDOC 2009
DIBAM
(Centro de Documentación de Bienes Patrimoniales, Subdirección de Museos, Archivos y Bibliotecas – Dirección de Bibliotecas, Archivos y Museos)
Santiago de Chile, September 29, 2009

Presentation (PDF 1,9 MB)

Global concepts and/or local heritage

Concepts and linguistic issues in museum terminology

This paper will cover the concept based terminology work within the KMM project in Sweden - an experiment of enforcing the retrieval of information by using standardised concepts of cultural heritage terminology in co-operational web service platforms. It will also cover a brief evaluation of the impact of a new service "Rikstermbanken" - a national portal of joint terminology resources - in the light of globalisation and "world wide" museum standards like AAT etc.

KMM - a background

KMM is an initiative for enforcing the retrieval of information by using standardised concepts of cultural heritage terminology in co-operational web service platforms. It takes the form of a joint development programme, heading for a research based focus on museum information and terminology development. Connecting a growing group of museums, a university and small business partners, we try to find solutions within the terminology handling in heritage databases. One resource is a cross museum searching platform, integrating many of recent Swedish CMS alternatives. Special for SAMSÖK is the possibilities of searching the group of databases by choosing conceptual terminology in hierarchical three-views.

SAMSÖK is an experimental service aiming for a twofold set of experiences. On the one hand developing the technical platform for direct cross-searching between datasets which are not harvested together, on the second hand developing and analyzing the mechanisms behind how "concepts" grow and are being connected to the "words", in direct practice: how to maintain the sources of more or less documented subject headings, object names, motifs and other vocabularies within the museums and to form a better mapping of terminology from different sources.

The traditional information in museums databases is broken down in parts which are conceptualised and completed with unique concept IDs, being generalised and possible to put into hierarchies. Once delivered into the central system, in KMM called the MASTER, this will also help local museums to clean up their terminology. Both local and central updating is supposed to be possible. Museums having strong needs for a concept of its own - not fully corresponding to the central one - can define a local variant. Locally defined concepts can then also be adopted centrally and used by all systems.

For search options this is supposed to be of great interest as it helps the end user to combine and possibly to adopt new concepts. From the technical experiments we can foresee a better matching between search terminology and retrieval results - and by that a better service to the users.

Since some years, SAMSÖK has been running as a service to museums and end users. It connects half a dozen collections of objects and images from a range of local and proprietary databases. Searching is based on traditional text search in specified fields of the database. Records are retrieved on-line from local data and displayed first in KMM formatted list and in the next step in the target museum locally flavoured web style.

Next generation KONCEPTSÖK is taking this a bit further. Still the search is made direct in the set of distributed databases. We have decided not to develop or adopt a harvesting model because the main idea of KMM is to develop and research tool for a distributed environment.

Local databases are enriched with columns for unique IDs for the actual concept from the MASTER. Typing the name of a concept (chair) in the web search form triggers the system to look up the concept id in the MASTER - presenting the terminology hierarchy and giving a chance to choose - and combine - relevant term for further retrieval.

As a result of "preferred label" and "alternative label" pointing to the same basic concept ID we open for a much wider linguistic approach in searching. We also give possibility to find the concept in its hierarchy of broader and narrower terms to help the end user to best make his choice.

Presentation of search results is then given in the same format as SAMSÖK.

Terminology - sources and definitions

Today we have collected terminology from varying sources; mainly museums but also some examples of used vocabularies in the wider heritage documentation. This is a work in progress carried by a group of skilled curators in the member museums.

The main test set consists of ca 20 000 words, all describing objects, motifs of photography, other subject headings or content of collections. We find here "salmon fishing", "birthday", "water transport", "chair" or "Christmas Eve" etc. There is a spread of sources over Sweden as well as over time, both local and global use, etc. Project staff has made several experiments in order to group, define, translate and relate these words/concepts to each other in the system.

We can see that parts of the terminology is rooted in common language - understandable for everyone - while some are more domain specific, or highly specialised. At least that is the first analysis or presumption. Looking once again at some of the concepts supposed to be in general use there is a lot of questions rising. In daily conversation we all are sure that other people share our definitions of a chair, a house, a train. Looking deeper into this leaves us with problems. Does a chair have two, three or four legs? - or even more? Do you sit upright or leaned back a bit? Must there be a back of a certain size or form? How about arm rests and padding? Material, size, technique etc. In other cases various properties or behaviour are important. A bird has two legs and a beak, most of them - but not all - can fly and most often they sing.

Hierarchy

To be able to hierarchically connect concepts, we need of course to hang an unambiguous definition around its neck, peeling away closely related things that might go in under the same definition. We must also handle multidirectional (context based) relations. Horses are part of animals as well as of farming, transport, and sport. Animals can also be part of sport, farming and transport. For personal reasons one might even want the possibilities to make customised hierarchies, lets say I want to put horses as part of food or recreation or...

Normally, the hierarchies can be built on a parent-child relation, where the narrow terms points to a set of concepts being (all) alternatives: animal -> horse, cow, dog, cat... or a set of concepts being parts of a whole: car ->doors, wheels, steering wheel, lamps...

Enabling the make of personalised hierarchies might also open for other ways, for example a more flexible, thematic or tentative, way of connecting concepts: pencil, stuffed blackbird, sandwich and pistol forms "all things my uncle had in his drawer" which is a part of "my childhood".

KMM concept model

The KMM terminology is compliant to a SKOS model. Trying to define a concept takes us into defining some meta-terminology and structure. We will be focusing on four main variables here. First of all the concept has a unique ID in the system. This is the key to all further handling and is generated by the system. The concept also carries a name, or in many cases a set of names in different context, which can be local, social, language etc. In the first language one of these names are chosen as a preferred label. All others are alternative labels and synonyms to the preferred labels. There can not be any hierarchical relationship between the alternative labels, as for example concepts, defined in other languages (other cultural context) might not fit in exact in the same hierarchy as in the first language concepts. There is also a scope note and a definition to use. The two latter parameters are the most complex to handle, raising a range of problems in interpretation, to be described later on.

The concept might be clearly defined - if strictly defined by technical parameters - but it might as well be a bit unclear and ambiguous if defined otherwise. The school systems in different countries may stand as an example - having unclear terminology for professionals, curriculum etc and different levels and exams to take just a few cases. On a basic level we all agree about what "school" is - until we have to define it!

We have met some problems with concepts not clearly defined - or concepts where definition varies from different context. The level of problems differs between different domains.

Technique/materials looks quite well defined and domain specific - with their context mainly set outside the heritage sector.

Geographical data are tied to the time terminology and relevant only when linked to a certain time value.

Time terminology can be well defined when we talk about years, centuries etc and shows only small problems when inside one culture. Time and period names differ more - and takes influences from culture, politics etc. They are more ambiguous when crossing culture borders too. Scientific era (Stone age, Middle Age) and also regional division (Norrland) also differ in relations to cultural bias and well-recognized naming traditions.

A persons name might vary over time and context and also differ from different format or notation even if it bename the same concept. (Hans Rengman, H Rengman, HR)

Context

In my examples later I have tried to focus on catching the values in everyday life language and the different definitions of each concept in different contexts. The main focus points can be said to focus on Time, Language, Culture, Country/Legislation, where the object is/was in use. There is also a difference between museum (frozen) context - historical (living or dead) context in our scope of discussion. Life/use terminology lives with the object and the persons around it. Museum context ends the life of the object and freezes the terminology by the time objects are collected. Or - re catalogued, or re-studied etc. In these latter phases the curator puts his knowledge - and BIAS - upon the earlier generations of terminology, understanding and interpretation. This will - undoubtedly - colour the documentation carried on to future. Authorized nomenclature can help to make this process strict and relevant - but it must also be an opening for taking care of values from past interpretations.

Contexts giving different reading of a word can be:

  • Geosocial - different word in North and South Sweden or in specific areas
  • Domain - Art history and ethnography differences
  • Time/generation - development of language over time or group
  • Ecosocial - different word in groups depending on group interest

One reason for taking this into awareness is that the Internet and community based tools today leads laymen into use of terminology former reserved for experts. Yesterdays internal controlled vocabularies compete with social tagging, wiki and googlified knowledge. Context might then be a bit forgotten.

In some cases terminology is translatable and still keeping the underlying concept. In other cases NOT. In the examples we can see that definition elements differ quite a lot. They might even be contradictory.

We can also foresee a global - or monolingual - nomenclature growing from more and more resources translated into English, and other large languages - not necessarily within the same cultural domain.

Rikstermbanken

The Rikstermbanken collects more than 58 000 terms from about 350 sources in various fields of content, presented in Swedish and other languages. Our minority languages as well as English, German, French and Russian.

It is an initiative from Terminologicentrum TNC, inspired by IATE (the EU terminology database). It started 2006 based on experiences from ca 1940 and up to today.

Rikstermbanken gives

  • Easy and fast access to terminology from many domains
  • Spreading of new terms and names
  • Tool for retrieving and storing and for research
  • Collaboration and harmonisation of terminology - not least the authorities terminology - saving time, money and giving better understanding

Rikstermbanken will be adding terminologies and taxonomies of new types, raising quality by using terminology methods in adding and revising content. Today there is some sources related to heritage but main vocabularies come from more standardized industries and civil services terminology.

This will be a source of enormous value for the heritage institutions in describing and documenting objects and images etc - as it gives a clear terminology, refers to the sources and the context and by that both in making it easier to find the right describing language and in enhancing the knowledge surrounding the core knowledge elements.

Giving direct access to all sources it would be very easy to search, browse the results and choose the adequate term for ones purpose - with possibility to refer to a recognized authority file source.

As a repository Rikstermbanken is of great value but it also raises some questions looking into the collected treasures from a heritage analytical perspective.

It is worth discussing, how a large standards repository will impact on the local language - from two perspectives: in future and in history. Undoubtless it will be of great value to harmonize nomenclatures of all kind in industry etc for better effectiveness and quality. Doing so we also tend to adopt new terminology/forget older terms - both standardised and from normal language - and even in an anachronistic way use the future language for historic description. We do not use context-different synonyms other than for specific domains and this tends to strengthen the use of the most common word, the one used by mass-media or on the Internet.

We can also see that many of the sources in Rikstermbanken are ambiguous in their definitions of concepts. This is natural as the sources mirrors different contexts. However there might be a difficult balance decision to choose a "preferred label" from a set of words.

Taking it a step further one can see a (minor?) problem in translating the words related to well known concepts of common understanding and in balancing linguistic labels and concepts from different sources like this Swedish initiative, AAT etc for multilingual purposes.

At the same time the widespread use of Internet - both for searching and for publishing - provides users with sources of varying quality and reliability. There is, definitely, need for a bit of source control and self control in use and spreading of standardised terminology.

Some examples

Socks

The traditional word for "a cloth covering for the foot worn inside the shoe; reaches to between the ankle and the knee" in Swedish is "strumpa" (South Sweden "hosa" - Danish/German influence). Since around 1970-ies we have also the use of "socka" (ankelsocka - English influence in West Sweden and in business of fashion). Today we meet a new word "liner" as the first inner layer when using two pairs for better comfort in running, skiing or hiking). One thing making the case complex is that "socka" (raggsocka) traditionally is the thick second layer used in boots. Here we have a real mix-up of region, generation, domain.

Ljusstake

In many cases we can guess that a relevant difference in definition is (partially) missing, today. One example is "adventsstake" (Four living candles, used during the four weeks of advent, preferably places on a table in the home.) and "julljusstake" (Five to seven electric candles, used for the entire Christmas period - November? to February? Or at least December 12 to January 15 - mainly placed in the windows of the home or offices.) which seems to be mixed up occasionally, and where the use of the latter has been extended to almost the whole winter in some cases, despite its name pointing to a period of just a couple of days. Instead of a serial use of these objects, there is a parallel.

Chair

The Swedish word pinnstol can be defined as a chair built of SVARVADE sticks TAPPED vertically into a massive seat, the head of the back being curved. For technical reasons this is good enough. Anyhow the concept pinnstol also covers something intangible - this was the first model of chair really mass produced and by that having an important role in furnishing Swedish homes during 19th and 20th centuries. It relates to most people, or their image of their parents history. Still produced, it is a piece of art or skilled craftsmanship.

Googling for "pinnstol" gives a lot of examples of how this concept - or really its name - has taken over many other concepts with much more diffuse common understanding, though still better defined than the "pinnstol". One clear example is the name "brickstol" whit a flat tray forming the back. One more - not so clear - is the name "stegstol" - "ladder-back chair". This concept is unclear for two different reasons. First, it is composed of sticks, or at least tiny pieces of wood, BUT attached together horizontally, like a ladder. Second, it is in fact an English translation of the concept name in one of the Rikstermbanken sources - the book "Engelska möbelord", "English furniture words" from 1989. (together with "spindle-back chair" and "slat-back chair". It both fits and not fits into the scope of "pinnstol". In AAT they are all three "chairs by form - back form".

The Windsor chair gets in some cases same definition as "pinnstol", but in other cases it is a child concept of the larger concept "pinnstol". Both of these are historically and technically incorrect. The Swedish form is for certain derived out of the Windsor chair, not opposite, and the typical Windsor chair armrests are not to be found on a typical Swedish "pinnstol".

Interesting is also that the word "stick-chair" seems to be used in English, for this type of object, even if its ambiguity covers a lot of other concepts, like chairs made of hockey sticks etc.

Villa

Villa opens for a plethora of terminology nerdism. Trying to compare a choice of Swedish, English, and other encyclopaedias, as well as wikipedia and other net sources, and controlled vocabularies like AAT, give us back such a vast range of definitions focusing on technical, administrative, architecture, social, town-planning and historical reasons for making definitions. The ancient roman villas are a special kind - the origin.

Adopted in our time, a villa can be anything from a quite small building in a row of same kind to a large luxury building with many rooms and different functions. In AAT a villa is placed as a "rural house" - by "location and context" and its narrower terms clearly place us outside ancient Rome, while it in Europe can be quite varying and both rural and town-house. The synonyms "detached house", "maison isolée", "einselhaus" and "friliggande småhus" signals quite different focus. Detached= NOT connected, isolée=isolated, små=little. AAT gives only "Semi-detached house"

Taking the easy way to find comparison material, searching with Google inamge search just to have an opinion from normal language use - not the controlled vocabularies gives an astonishing result. English detached houses seems to be quite close to each other as a concept, German einselhaus too. French maison isolée shows in fact a far more isolated approach. Swedish friliggande småhus returns just a few photos of houses, but far more plans and maps. This term is not a natural language word like the others, it is a town planning administrative term.

So we have a matching - or discrepancy - between "villa" and the local synonym - more in som e countries, less in others. We also have a matching - or discrapancy - between the villas in all countries as well as (of course) between einselhaus-detached house-maison isolée-friliggande småhus. To find a smallest common denominator is not to be done - it should include all forms of one-, or two-family homes with or without a place for car, roses, or playground. It might be placed alone or in groups, being one or two story-houses, with or without cellar etc. Interesting is the difference between the use of British and American English (language or culture?) as it takes the form in AAT.

We also have to take in consideration the different architecture styles in the countries - not mixing that up with the concept in itself.

How to evaluate these examples

All these examples can be seen as very narrow-minded and unimportant detailed. This might not be taken too serious in itself. My intention have just been to show the vast variety of concept mismatching and some of their reasons, the need for the context - bothe the object context and the knowledge producing/ distributing user context, and linguistic reflexion.

Analysis

KMM terminology work has the aim to raise retrieval result qualities, and the aid for end users in the process, but also to make research and theory on data quality and terminology use in heritage documentation, in act to increase the level of quality in the museum databases in Sweden without loosing fine granularity in language and nuances in meaning.

So - we have found that we can develop work on two differing approaches, or directions, in how to get along with the structural work of organising the sources.

A

Promoting a general net of quite rough concepts as start point for searching

  • Good enough? for retrieval - easy? to define
  • Points to imperfection and inconsistence
  • Updating easy and close to needs
  • Relevance in important domains
  • Can be developed towards detail levels if needed

B

Matching - or mapping - real concepts on detail level

  • Catching fine granularity knowledge
  • Many parallel tracks or hierarchies
  • Lack of overview
  • Enhancing museum importance
  • Saving disappearing heritage knowledge

In model B we should consider leaving the most common examples of everyday language - as we have seen how we tend to understand them on a common basis until we try to define them... In model A we try to avoid too detailed concepts.

We have to get in touch with the source of 20 000 + words

  • Grouping some of them together around a central concept
  • Defining some of them for further use
  • Connecting (some of) them in parent-child relations

Conclusions

Local databases show a varying linguistic approach. Their content is collected over time and from differing contexts - not necessarily well defined. It carries a linguistic flavour worth preserving. Unstandardized local databases open for a range of difficulties taking heritage documentation to wider groups of users.

Central authority files are a strong tool for setting up interoperability and exchange and helps users to find results - in cases where relevant terminology is set on items. They also lead the end user into help to find his way. These files are context sensitive in higher extent than first seen. Forced onto disparate database material it can make understanding of heritage difficult. Controlled vocabularies are, by its rigidity, giving ensured quality - but need expert approach at some level to be usable.

Global language - English - tends to set standards in web based information exchange and also in heritage documentation. This opens for interchange but can be a risk if fine granularity knowledge loss. Influences here come from inside museums, as well as from everyday language use by strong agents and from controlled vocabulary in specific domain transferred to "weaker" domains.

KMM experiments and system development aims to both find tools to adopt more standardized and clear vocabulary for heritage documentation and at the same time ensure the quality level of the context based knowledge that is in itself a part of the heritage to preserve. For that work a future forthcoming collaboration between the project and other initiatives like Rikstermbanken, AAT, Social tagging etc, mixing controlled and free linguistic approach.

In that work the concepts are of great importance.

Syntesis

From a logical and philosophical perspective the processes mentioned in the paper might lead us to a museological implication on museum documentation quality:

  • Heritage Concepts are local - and grown within its context - Large (and multilingual) authorities tend to be global and generalized by nature.
  • Local concepts can not be turned into global concepts without loss of knowledge.
  • The benefit of too large global authorities system might be overvalued and have to be further researched before we force us to risk fine grain knowledge loss.
     
  • Discussing a tentative attitude to strong beliefs in central systems.
  • Open for a better connection between museums and other initiatives such as those evolving in Internet communities, based on theory, research and analysis of real, context related content.
  • Override the gap between extremes like controlled vocabularies and social tagging to a constructive dialogue.
  • Lead to a discussion on museum responsibility in preserving linguistic dimensions as well as physical heritage.

Concept based retrieval can be seen as a platform for experience and a key to better cooperation in all these cases.

 

Hans Rengman

 

Links

  • KMM - samsok.kmmuseum.se/kmm/
  • SAMSÖK - samsok.kmmuseum.se
  • Konceptsök - samsok.kmmuseum.se/koncept/
  • Terminologicentrum - tnc.se
  • Rikstermbanken - rikstermbanken.se
  • IATE - iate.europa.eu/iatediff/switchLang.do?success=mainPage&lang=sv
  • AAT - www.getty.edu/research/conducting_research/vocabularies/aat/

                                              meta
+46 (0)70 718 23 25
hans@meta.se

Skype Me!