|| | | | | ||
D3.6a Prototype service providing automatic classification of Engineering resources
Automatic methods of gathering and knowledge organization are necessary in order to improve the discovery of Internet resources. Not even a large co-operative effort can cope with the quantities and the amount of changes to the documents for a service or subject area of some size.
This deliverable demonstrates an approach, using cross-browsing and cross-searching, to integrate a manually selected, catalogued and quality assessed collection of WWW-resources with a much larger robot-generated subject index in the same subject area.
Comprehensive Peer reviews for this deliverable are also available:
Automatic methods of gathering and knowledge organization are necessary in order to improve the discovery of Internet resources. Not even a large co-operative effort can cope with the quantities and the amount of changes to the documents for a service or subject area of some size. Our EU-project, DESIRE, developed an approach, using cross-browsing and cross-searching, to integrate a manually selected, catalogued and quality assessed collection of WWW-resources with a much larger robot-generated subject index in the same subject area. We started exploring different methods of gathering an Engineering index from the web and creating a stable database for the project. In order to provide a subject based browsing interface to this index some kind of automatic classification is needed. The goal was to structure the index using the same Ei (Engineering information Inc.) classification which is used in the quality service Engineering Electronic Library, Sweden (EELS). This will allow cross-browsing between both services. We explored and evaluated different methods of automatic classification based on this established classification system, each using different heuristics for matching, weighting and display. In co-operation with other projects we started studying options which might result from the usage of a universal classification system. After the project, the automatic classification and cross-browsing functionality will be added to the EELS service.
Automatic classification, Harvesting, Robot-generated subject index, Subject gateways, Web resource discovery, Metadata, Engineering, Dewey Decimal Classification, Ei thesaurus and classification
This report covers two fairly separate subjects, which are both of significance to the further development of Subject Gateways. The original DESIRE project championed the development of Subject Gateways as a means of expediting the discovery of high quality research-level information over the Internet. DESIRE II has continued with this theme, particularly through the publication of the Information Gateways Handbook. It now seems likely that the future development of Subject Gateways will depend upon cooperation between gateways, particularly through interoperability, and other projects such as Renardus (www.renardus.org) and Imesh (www.imesh.org) are studying methods for how this can best be achieved. The first section of this report concerns the problems associated with the cross-browsing of subject gateways and two possible mechanisms by which this may be accomplished.
The second part of the report discusses the types of relationships between documents, between terms in thesauri, and between classes in classification systems; both of the latter two systems being used within Subject Gateways as means for improving subject access. We also consider a basic core subset of these relationships that are relevant for hierarchical thesauri and show how these can be encoded using the newly developed Resource Discovery Framework Schema mechanism, and expressed using XML. Hierarchical controlled vocabularies are now seen as a very useful means of aiding subject access to collections of data, and work is underway to see how such vocabularies can be made multilingual and also how different schemes may be cross-mapped. We believe that the system of encoding thesauri we propose can act as a common format to facilitate the transfer of controlled vocabularies between organisations, but will also facilitate the expression of relationships between the terms of different thesauri and between the different language terms within a single thesaurus.
This deliverable comprises two sections: the first covers suggested methods for implementing cross-browsing as a means of searching across subject gateways; the second is concerned with the encoding, storage, and use of Web-based hierarchical controlled vocabularies, as aids to the keyword searching of subject gateways. Browsing is a commonly provided complementary option to keyword access to resources indexed on subject gateways. Problems with the browsing paradigm arise, however, when cross-browsing between gateways is implemented. In particular, this report looks at two mechanisms whereby resources from different gateways can be retrieved and displayed; as well as looking at the issues that arise when the different gateways implement different classification systems, leading to problems in cross-mapping between the sections of the different schemes. The second part of the deliverable is concerned with the structure of both thesauri and classification systems, covering the different relationships that can exist between the concepts they contain. It also looks at a method of encoding thesauri for storage and/or data transferral, providing a candidate for a standard syntax that uses the Resource Discovery Framework (RDF) and XML.
Subject gateways, Browsing, Cross browsing, Classification, Classification schemes, Resource Discovery Framework, XML, Metadata, Thesauri
Contact | © 1998-2000 DESIRE Consortium | Disclaimer | Search