Metadata and Marc

By Keith Gove

As indicated in Connections Issue No. 28, we would like to keep teacher librarians abreast of developments in cataloguing, indexing and related matters such as metadata. This first in a series of articles aims to demystify some of the jargon that is commonly used.


Metadata is data about data. The term has been used recently to refer to data describing Internet sites. The idea of metadata was not created just for the Internet -it has existed for centuries. The familiar library catalogue is metadata, that is, data about data: data about the books and other resources in a library. There is a library catalogue metadata standard, MARC (Machine Readable Cataloguing), which is used by most libraries in the world. It was developed to allow libraries to share their cataloguing data.

Recent international work has led to agreed standards about metadata for Internet sites, the current main standard being the Dublin Core Metadata Standard. This arose out of work by an association of major information industry stakeholders (including rivals such as Netscape and Microsoft) -the World Wide Web Consortium (W3C). They undertook this work in response to a demand for a censorship rating system. It became obvious that the techniques being developed for selective resource suppression could be equally effective for selective resource discovery. W3C combined with OCLC (Online Center for Library Computing) in Dublin, Ohio, USA, where the Dublin Core metadata framework was initiated. OCLC and the Library of Congress are the major stakeholders in, and developers of. the USMARC standard on behalf of the American library community. Their involvement ensured that a 'library' perspective was included in the Dublin Core standard. The purpose of the W3C/OCLC deliberations was to develop an overarching set of categories that could accommodate standards such as AACR/ MARC as well as other non-library contexts.

The Dublin Core and MARC metadata standards have much in common, not surprisingly because they both aim to do the same thing: describe resources so that users can efficiently find them. Both identify access points (title, subject. etc), identifying information (publisher, date, object type, etc). and information which relates the object to others (relation, source). Unlike MARC, Dublin Core was developed as a universal framework for library and non-library contexts, ·1or example corporate record-keeping. Dublin Core forms the basis of the AGLS (Australian Government Locator Service) Metadata Standard and the EdNA (Education Network Australia) Metadata Standard.

A brief comparison of USMARC and Dublin Core is in the accompanying table (page 2).

Metadata does not just exist -it is created as part of the document (eg its heading), or is assigned to the document by the author or other appropriate people. This is where the newer approaches (such as Dublin Core) differ from the older. Library metadata (catalogue records) has traditionally been created by relatively small numbers of specialised cataloguers. The outcome provides highly reliable, consistent and useful records for users, contributing significantly to their ability to find precisely what they are looking for. This is, however, a labour-intensive process, and hence relatively expensive, although the sharing of catalogue records (such as in ABN and SCIS} makes the task manageable.

Metadata about Internet sites has tended to be created by the authors of the website, then automatically harvested by Internet directory services such as EdNA and search engines such as Alta Vista. This process saves labour, but is less reliable than manual indexing in terms of accuracy and consistency. It has been likened to letting authors shelve their own books in libraries.

You will find a list of selected references on Metadata on page 8 of this issue.

Keith Gove

Manager, Information Services

SCIS