National Library of Australia Staff Papers, 1999

Font Size:  Small  Medium  Large

Dublin Core Metadata and the Australian MetaWeb Project

A paper presented by Debbie Campbell, Manager Infrastructure Projects, Coordination Support Branch at the 10th national Library Technicians' conference, Fremantle, 8-10 September 1999.

Abstract

The Dublin Core Metadata Initiative began in 1995 with little fanfare, but it is now a widely recognised international trend. The increase in metadata-based projects in Australia has been significant since that time, resulting in Australia becoming a centre of excellence in standard metadata deployment. During 1997 and 1998, a small group of Australian institutions joined forces on the MetaWeb Project, to facilitate the advancement of metadata initiatives [METAWEB].

This paper explores the Dublin Core metadata trend, outlines the achievements of the MetaWeb Project, identifies needs which are being met by metadata initiatives, and provides a snapshot of activity in Australia in 1999. Implications for information providers in the field are also discussed. The potential result for Australia is a strong information infrastructure.

1. INTRODUCTION

The Dublin Core Metadata Initiative began in 1995 with little fanfare, but it is now a widely recognised international trend. It has developed because of its promise to bring some consistency to a large amorphous mass known as the World Wide Web. The Web as an access and distribution mechanism for information was obviously important, but the ease with which this could be done has also been part of its dilemma - inconsistency, lack of currency, and the sheer quantity of "information" available.

This paper addresses the following questions:

  • What is metadata ?
  • What does metadata look like ?
  • What tools are available to create and manage it ?
  • Which metadata schema ?
  • When is metadata applied ?
  • Who is creating metadata in Australia ?
  • What is a subject gateway ?
  • What are the benefits of the metadata/subject gateway approach ?
  • Is it the same as cataloguing ?
  • Where to find further advice about metadata.

2. WHAT IS METADATA ?

Metadata was seen as a response to the dilemma created by the Web. It was proposed to 'help impose order on chaos' [LYNCH]. Arising out of the library community, which is expert at applying order to information, metadata emulates cataloguing to a certain degree. Metadata has been variously described as 'structured data about data' [OCLC], or 'information about data', even 'data about data'. These are technological descriptions. A further definition has been provided by the Director of UKOLN, Lorcan Dempsey: "metadata is data associated with objects which relieve their potential users of having to have full advance knowledge of their existence or characteristics. A user might be a program or a person, and metadata may support a variety of uses or operations." [DEMPSEY et al]. The word 'meta' also means change. Metadata has permitted change in the way information providers give access to information via the Web.

The first change was the coining of the term 'resource discovery', which is meant to overcome some of the difficulties of finding valid and valuable information on the Web. Lots of practices and principles have been derived in the interests of facilitating resource discovery on the Web. One of those practices is the application of metadata. But metadata has also facilitated change in other ways because it may be applied on a large scale, for example, in a national system; or on a small scale, such as to a Web site or to Intranet documents. It has been used in government departments to bring together information services for staff, by binding access to different and scattered sources of information.

Metadata facilitates a wide range of purposes. It is not restricted to resource discovery applications. It is also being applied to areas including content ratings, intellectual property rights management, document administration and preservation management. The most well-known schema, however, is the Dublin Core [OCLC]. Named for its town of origin - Dublin, Ohio, in the United States - it is espoused as a Core because it can be the basis for many of the schemas used in the areas mentioned above. It can even be used to generate MARC records.

Australia has been a keen participant in the development of the Dublin Core, attending all of the Dublin Core workshop series since 1996, and hosting the fourth workshop in March 1997 at the National Library. The interest in the initiative was derived from many quarters - libraries, archives, museums, government departments, and commercial enterprises. It is supported by Internet standards specialists such as the World Wide Web Consortium (W3C).

3. WHAT DOES METADATA LOOK LIKE ?

While Library technicians have been instrumental in the development of the Dublin Core, its original intent was to encourage self-authoring, in line with Web publishing [DC]. In its simplest form, known as unqualified Dublin Core, there are fifteen elements: title, creator (author), subject (keywords), description, publisher, (other) contributor, date, resource type, format, identifier, source, language, relation, coverage, and rights management.

In a HyperText Markup Language (HTML) encoding, one of the elements, title, appears in a META tag like this:

<META NAME="DC.Title" CONTENT="MetaMatters">

The development community struggles to keep this schema simple in the face of 'complexification' (more about this later), to keep the potential for its uptake broad, and the costs of its production low. Each element is optional. Each element is repeatable. While this introduces flexibility, it also has the potential to introduce inconsistencies in resource description. Many Australian implementors have addressed this issue by recommending a minimum mandatory set, such as title, creator and publisher; or title, identifier and keywords. Each set is chosen to be sufficient for locating and differentiating between single resources when searching.

Metadata, particularly for resource discovery, is usually stored either in the HTML Web pages it is describing, or in databases similar to those which support catalogues. Where the resource in hand is a three-dimensional object such as a museum artifact, a painting, or paper-based, a Web page known as a digital surrogate may be created to 'host' the metadata as well as a virtual rendition of the object.

Unqualified metadata has already been embedded in the Web documents hosted by many institutions around the country. Some sites have started small, with lists of useful links or Internet subject guides. Metadata is applied to selected resources considered to be essential reading for their clients.

4. WHAT TOOLS ARE AVAILABLE TO CREATE AND MANAGE METADATA ? (What the MetaWeb Project did.)

When the MetaWeb Project began in 1997, there were only sporadic implementations of unqualified Dublin Core. This was partly due to the fact that the schema itself had not been through a standardisation process. At the time it consisted of the 15 simple elements and the Canberra Qualifiers. A few adventurous implementors were experimenting with the qualifiers - type, language and scheme. For example, the language qualifier appears as follows:

<META NAME="DC.Title" LANG="en" CONTENT="MetaMatters">

The value "en" is taken from a scheme known as RFC1766, which is a list of language codes. The scheme's name is included in the META tag as follows:

<META NAME="DC.Title" SCHEME="RFC1766" LANG="en" CONTENT="MetaMatters">

The Dublin Core schema also recommends the use of another scheme, ISO 8601, to represent the Date element. These are the only two standard schemes recommended in minimum usage. However, the subject element is often populated with keywords from other schemes such as thesauri, or controlled lists of values. Implementations including the State Library of Tasmania's Tasmania Online service use Library of Congress Subject Headings (LCSH) for this purpose.

Technically, the three qualifiers can be applied to each of the elements, resulting in many separate META tags. Because they are repeatable, the number of tags can easily increase. This process is known as 'complexification', and if implemented, then it approaches USMARC in its complexity. The issue of which schema is more appropriate to meet the business requirement of the organisation, and therefore which tools should be chosen, needs careful consideration.

There are many metadata creation tools available free of charge, from many Library-based international Web sites. The MetaWeb Project developed some of these tools. The Project partners were: the Australian Defence Force Academy, Charles Sturt University, the Distributed Systems Technology Centre, a Brisbane-based Cooperative Research Centre [DSTC], and the National Library. Under the auspices of the MetaWeb Project, the DSTC launched an Australian tool known as 'Reggie'. It provides standard templates for the creation of metadata in several schemas, page by page.

The Project also provides access to an editor made available by the State Library of Tasmania, free of charge. It permits the creation of Dublin Core and AGLS metadata simultaneously, and is used by libraries around the country.

The Project also developed a site generator, which allows retrospective conversion of all of the pages at a pre-specified URL (Universal Resource Locator). It embeds metadata in each page for six of the 15 unqualified Dublin Core elements which can be provided without human intervention.

While generators are useful for creating a mass of metadata, they are not as viable as humans for producing high quality description. Experience in creating bibliographic data and expertise in selecting keywords or subject terms are often sought by the serious implementors of metadata-based services.

The application of thesauri has been much discussed in the metadata community. There are many issues to be solved in terms of which thesaurus is appropriate, whether they are Web-accessible and therefore easily navigable between a metadata creation tool and the creator, how the thesaurus terms may be exploited in searching, and whether several thesauri can be sensibly supported simultaneously. There is no restriction on the number which can be used although a small number is considered more reliable. While several projects are exploring these issues, there is not yet a single recommended solution.

Other tools such as link checkers, which help to identify and eliminate broken links, are also necessary for providing a high quality service of metadata provision. In general, software is available for this purpose.

5. WHICH METADATA SCHEMA ?

In Australia, several schemas have been made publicly available for use. They are listed in the DSTC's creation tool Reggie, which is actually a registry of schemas that provides associated editors. It is useful to look at the registry before deciding which schema to use for creating metadata. The choice may be based on the type of resources you are describing, or the type of organisation you are working with. For example, if you provide educational resources, you may choose to use the education schema developed by the Education Network of Australia [EdNA] which has been endorsed by all government-based educational departments around the country. If you want to provide access to digitised geospatial resources such as maps, the older Australian and New Zealand Land Information Council [ANZLIC] schema may be more suitable.

One of the key goals of selecting and applying a metadata schema consistently is to permit 'interoperability'. While this term has many contextual meanings, for metadata-based services it means that they can share and swap information, especially if they have used the Dublin Core as their base. Local extensions, i.e., elements which are specific to a single organisation, are easily accommodated, while also permitting easier information exchange between services.

Several Australian schemas are based on the Dublin Core. They include EdNA, and the Australian Government Locator Service [AGLS]. The EdNA metadata schema is utilised by the schools, universities and vocational education sector to describe resources for EdNA Online. The EdNA Online service is owned by all departments of education aorunf the country. The schema was written before Dublin Core, but it has since been converted to comply with it and AGLS. EdNA Online has 9,000 resources described.

The AGLS schema was developed by the National Archives of Australia and other agencies to describe generic government resources at federal, state and local levels. The use of AGLS has been mandated for all government agencies. For Commonwealth resources, it is now estimated that 10% of government Web documents have embedded AGLS metadata.

If you have Web resources that fall into more than one of these categories, you do not need to describe them several times. The shared base of Dublin Core removes that requirement for schemas which use it. Other paths are possible where Dublin Core is not the preferred choice for description, but information needs to be shared between services that are similar. For example, the Library of Congress has made mappings available which permit the conversion from USMARC to Dublin Core and vice versa. The National Library is currently participating in OCLC's Cooperative Online Resource Catalog Project to experiment with this type of application [CORC]. Harmonisation is another possible technique, whereby an older existing schema or a proprietary schema may be standardised by replacing its elements with the Dublin Core and local extensions.

6. WHEN IS METADATA APPLIED ?

Metadata can also be used on a smaller scale. Within institutions like the National Library, metadata is applied to a range of information sources beyond Kinetica. Firstly, it is used internally for our Web site. The NLA uses the AGLS schema to meet its obligations as a government agency [NLA]. Secondly, for our registry file system documents - which will be upgraded to reflect the new National Archives' recordkeeping metadata schema [RKMS]. Thirdly, it will be utilised on our updated Intranet, which may be likened to our own small online library of staff publications containing policy advice. Metadata has changed the way that we view our information provision to our own staff.

In reviewing our internal information management strategies, we are looking at how metadata need only be created once and shared across all of those different repositories. We are all creating metadata already, every time we write a document in a word processing package. When eXtensible Markup Language (XML) replaces HTML as the Web's common language, and is used as a basis for word processing packages, the capture of this metadata for re-use will be made even easier.

Standard metadata is applied to a wide range of objects which can be represented on the Web. It can be applied to people, using the vCard schema [vCard]; and to three-dimensional objects such as museum artifacts. Australia has been involved in the international initiative known as the Consortium for the Computer Interchange of Museum Information [CIMI], which explored the use of metadata for the description of artifacts in museum collections. In 1998, after many heated discussions, the CIMI members chose the Dublin Core, with best practice guidelines written to support their particular needs.

The MetaWeb Project created some specialist, but standards-compliant, tools to exploit the investment in metadata creation. A tool called a gatherer goes out to some pre-determined Web sites, and brings back the metadata to a central repository (or database) where it is indexed for subsequent searching. The metadata may be updated at any time where it is held, while the gatherer acts at regular intervals to bring it in. The tool which does the indexing is called a broker, and the indexes it creates are managed in a similar way to those supported behind the scenes in public access catalogues. A search interface is then supplied to query the metadata, which may be executed across the whole of the metadata set for a particular item, or by specifying a single element to be searched.

7. WHO IS CREATING METADATA IN AUSTRALIA ?

In 1997, Clifford Lynch, the Executive Director of the Coalition for Networked Information (CNI), made this prediction about the uptake of metadata. "We may see metadata first capitalised on as a way of enhancing retrieval by more disciplined, specific, subject-focused and selective Web indexing services --- maybe those run by the academic/research community, or by communities of people interested in specific sorts of content. You know when you do a wedding invitation, are you going to attach metadata to it when you incorporate it in your personal Web page? I don't think so." [LYNCH].

There are currently two major groups of metadata implementors in Australia: government services, and subject gateways which have generally been established by universities. Many government departments have established their own gateways, for example the Commonwealth-based Environment Australia and the Department of Health and Aged Care. At the State government level, each State and Territory has, or is developing, a state-based gateway which is enabled by a metadata search engine. For example, in Western Australia, the Department of Commerce and Trade is developing a single government window to facilitate all public transactions with government departments. The ultimate aim is to amalgamate these efforts into a single access point for all Australians to reach government services via the Web. The name of this Project is GoverNet.

Since the completion of the MetaWeb Project, the MetaWeb tools and principles have been adopted and adapted by several subject gateway projects to experiment with discipline-oriented sites, proposed to facilitate research by various communities in Australia. These are, to a certain extent, fulfilling the vision of Clifford Lynch.

8. WHAT IS A SUBJECT GATEWAY ?

A subject gateway may be defined as "a Web-based mechanism for accessing a collection of high quality, evaluated resources identified to support research in a particular subject discipline. The resources are evaluated and described by information specialists in the field, such as science librarians. The Australian higher education sector

and its partners are in the process of establishing three new discipline-focused entry points - to agriculture, chemistry, and engineering with information technology." [SG]

The Australian subject gateways have shared developmental and managerial approaches. They have learned from each other's mistakes. They are all collaborative in nature, and the academic partners in each gateway share the responsibility of selecting resources according to pre-determined criteria, and creating the metadata for them. Each gateway has assessed the available metadata schemas, and selected the Dublin Core as a base for the standard descriptive schema for their resources. Additional elements have been borrowed from the EdNA and AGLS schemas, and in some cases they also support local extensions.

The gateways provide current material in a single or related discipline, which is of high research value, from trusted sources. Additional gateway services such as peer review, authoritative resource description, and distributed maintenance are critical success factors for their future. The gateways each have distinctive logos and names: Agrigate, AVEL and MetaChem. They also provide free access to their resources.

Agrigate contains descriptions of resources both online and offline which have been identified as valuable to those participating in agricultural research [AGRIGATE]. Agrigate's resources are selected by an editorial review process applied by librarians and members of the agricultural research community, including the CSIRO. A small number of Web resources held at sites overseas are included, and Agrigate is forming relationships with similar overseas gateways.

The Australian Virtual Engineering Library [AVEL] is a gateway for quality Australian engineering and information technology (IT) resources. AVEL has been developed to help engineers and IT professionals save time and find information on the Web quickly. AVEL has formed a close relationship with its European equivalent, the Edinburgh Engineering Virtual Library [EEVL], to explore international collaboration for the provision of this type of information world-wide.

MetaChem is a gateway to significant chemistry resources, with the aim of establishing a quality chemistry information infrastructure able to support individual research and academic teaching programs. Hosted by the Australian Defence Force Academy under the auspices of the University Library of NSW, chemistry resources are available in one single reliable location. The resources have been selected and catalogued by university science librarians. This gateway has been available since January 1999.

Each of the gateways has identified at least one thesaurus to facilitate the selection of accurate subject terms. AVEL is now considering the use of LCSH. The MetaMatters site provides a single entry point to the subject gateways, and to other sites with similar characteristics [META].

Subject gateways with their metadata tools have the ability to unlock under-utilised resources. The nation's archives, museums and galleries contain many such sources of information. One exemplary service which is helping institutions unlock the potential of their heritage resources is the Australian Museums and Galleries Online service [AMOL].

AMOL may be considered an organisational gateway, not a focus point for a particular subject discipline. Nevertheless it operates under similar principles. It utilises the MetaWeb model of distributed metadata creation and facilitated centralised access. The Project team gathers Dublin Core metadata from more than 1,000 museums and galleries around Australia, providing an up-to-date service for locating collections. It includes diverse and remote distributed sources such as university museums and regional historical collections and items. In contrast to the research audience pitch of the university-based subject gateways, AMOL provides its services to school students and the general public, although research is of course enabled. AMOL also helps museums understand the richness of their own collections, and enhances access to those collections.

9. WHAT ARE THE BENEFITS OF THE SUBJECT GATEWAY APPROACH (and in tandem, the application of metadata) ?

The benefits of the subject gateway approach are currency, consistency, and selectivity. Only essential resources need to have metadata applied as part of any information management strategy, and ensures that resources not returned in a search merely because a keyword occurs somewhere in the full-text rendition. Search strategies can still be broadened by users to go beyond the metadata only if it doesn't meet their information need.

Traffic on the Web is reduced as queries are placed against metadata rather than full text, and the full text is only retrieved after further selection.

Staff resources in terms of both time and money utilised to locate research material are saved as the gateways are kept up to date and relevant. Anyone in Australia can suggest resources to the gatekeeper of each gateway for potential inclusion. Therefore each library doesn't have to duplicate this type of service.

The gateways also provide a platform for the further exploration of issues surrounding both facilitated and unmediated access to Australia's information infrastructure. Such issues include the duplication of effort in terms of metadata creation, which may be reduced by sharing metadata between gateways. This avoids monotonous and expensive replication of metadata creation. Technical solutions for searching across gateways is one issue - we don't want end-users to have to find each gateway separately. The management of broken links through persistent identification schemes such as Persistent Universal Resource Locators (PURLs) or Universal Resource Names (URNs) may also be explored. The National Library installed a PURL resolver service last year, but it is worth noting that this solution for managing broken links is still dependent on the URL protocol. We have started to explore the next generation of persistent identification - URNs - which are location independent and therefore somewhat similar to the function of an ISBN or ISSN.

10. IS METADATA REALLY THE SAME AS CATALOGUING ?

You know that a new field of endeavour is becoming a serious consideration when new terms are provided. Recently, the terms "metaloger" (a person who creates metadata), and "metalogging", as practised by a "metalogger", have been used in the metadata community. To be metadataed [sic] is also an interesting phrase. But although these terms emulate the cataloguing profession to a certain extent, the general consensus is that they are not the same fields of endeavour. Dublin Core is not a simplified cataloguing format. As expressed by Gradmann in his IFLA paper "Cataloguing vs. Metadata: Old Wine in New Bottles?", the concept of metadata is meant to provide machine understandable description [GRADMANN, 1998]. It is not just to be read by humans, but interpreted by computers which will then act on it, for example, to differentiate between good and untrustworthy sites.

The example show above illustrates this. It can be interpreted by both humans and computers. When it is converted to XML, humans won't be able to read it, although we will have better interfaces for our metadata creation tools. In some cases we will create it transparently.

Although the action of creating descriptive metadata is similar to the process of cataloguing, the effects of that process will be quite different. The consequence of applying metadata has been described by Stu Weibel, the guardian of the Dublin Core Metadata Initiative. He believes that the World Wide Web will separate into a Web of trust as created by librarians and other dedicated information providers; and the Web as we mostly know it today, containing large search engines which trawl millions of pages to cram as much stuff as possible onto your desktop. In addition, resources with metadata can be ranked higher in a results set.

Some research is being conducted into the idea of sharing metadata creation, whereby the resource creator provides some of the descriptive elements and the caretaker of the work (as personified by a librarian or an archivist) provides the elements which make it accessible. The result may be similar to the generation of cataloguing-in-publication details, but the intention is to have a single system supporting the process that is fully integrated and accessible by both parties. The outcomes of this research are being watched with interest [PRINCETON].

And, the potential to capitalise on the investment made in cataloguing print resources by combining them with electronic resources is enormous. We don't want to lose our legacy data in the process of applying metadata - we want to combine access to distributed resources. These are the gains the metadata community in Australia would like to achieve for its information infrastructure.

11. WHERE TO FIND FURTHER ADVICE ABOUT METADATA

The MetaWeb Project provided services for the dissemination of information about metadata and its applications which still operate today. They are the Project's Web site, and an Australian metadata discussion list. The Web site provides an analysis of creation tools, describes the MetaWeb software which is available free-of-charge, and provides a link for joining the discussion list.

The list currently has more than 300 subscribers. It is not overwhelming to join - the focus is on disseminating information from international lists as appropriate. Many implementors raise issues on the list, and are generally happy to share their experiences. The metadata schemas themselves often have their own developmental lists, and are not for the faint-hearted. Occasionally there are claims that metadata is just reinventing the library cataloguing wheel, but while there is some overlap, it has also provided an opportunity to view resource access and discovery afresh.

Since the MetaWeb Project was completed, the National Library has created a new site called MetaMatters [META], which provides an introduction to the Dublin Core and gives details on user guides and implementations around Australia. These are important references, as they provide sufficient detail so that the standards do not have to be reinvented. They also give contact points for people in the new roles in the metadata information community known as metadata coordinators and metadata librarians. Institutions such as the University of NSW, the University of Queensland, and the State Library of Tasmania have library staff dedicated to these roles.

Conclusion

Metadata has been, and will continue to be, an enabling tool for extending the usability of the Web in the areas of resource discovery and access. Its application in subject gateways and other information services again demonstrates the validity of the use of standards in bringing high quality information to the attention of our constituency; and as a consequence, strengthens Australia's information infrastructure.


Bibliography

[AGLS] Australian Government Locator Service
http://www.naa.gov.au/GOVSERV/AGLS/user_manual/contents.htm
[AGRIGATE]

An Agriculture Information Gateway for Australian Researchers
http://www.agrigate.edu.au

[AMOL]

Australian Museums & Galleries On-line
http://amol.org.au/

[AVEL]

The Australian Virtual Engineering Library
http://www.library.uq.edu.au/avel/; to become http://www.avel.edu.au

[CIMI]

The Consortium for the Computer Interchange of Museum Information
http://www.cimi.org

[CORC]

Cooperative Online Resource Catalog
http://www.oclc.org/oclc/research/projects/corc/

[DC]

Dublin Core Metadata Element Set: Reference Description
http://purl.org/DC/about/element_set.htm

[DEMPSEY]

Discovering Online Resources. In at the Shallow End: Metadata and Cross-domain Resource Discovery
http://ahds.ac.uk/public/metadata/disc_07.html

[DSTC]

The Resource Discovery Centre
http://metadata.net/dstc

[EdNA]

The Education Network of Australia
http://www.edna.edu.au/metadata/

[EEVL]

The Edinburgh Engineering Virtual Library
http://www.eevl.ac.uk

[GRADMANN]

Cataloguing vs. Metadata: Old Wine in New Bottles ?
Gradmann, Stefan; Paper of the 64th IFLA General Conference 1998, Booklet 4

[LYNCH]

Clifford Lynch in interview, July 1997
http://www.ariadne.ac.uk/issue10/clifford/

[META]

MetaMatters
http://www.nla.gov.au/meta/

[METAWEB]

The MetaWeb Project
http://purl.nla.gov.au/metaweb/home

[NLA]

The National Library of Australian Web Site Metadata Strategy
http://www.nla.gov.au/metadata.html

[OCLC]

The Dublin Core: A Simple Content Description Model for Electronic Resources
http://purl.org/DC/

[PRINCETON]

Proposal for Metadata System
http://www.princeton.edu/~jamesw/mdata/metadataprop.html

[RKMS]

Recordkeeping Metadata Standard for Commonwealth Agencies
http://www.naa.gov.au/govserv/techpub/rkms/intro.htm

[SG]

A National Framework for the Development of Australian Subject Gateways
http://www.nla.gov.au/initiatives/sg/

[vCard]

vCard overview
http://www.imc.org/pdi/vcardoverview.html