Transformation and management of XML data

[Project description] [Prerequisites] [Examination] [Schedule] [Groups] [Useful links] [Lecturers]

4-week project, November 25 to December 20, 2002.


Project description


Lots of data is stored in formats that make it hard to manage the data and to perform the queries one would want. Either because the format is not standardized, or because the specific format lacks software tools. XML formats promise to ease the task of managing, transforming and querying data, due to the large number of tools aimed at supporting XML data. Therefore a current trend is to transform old data formats into an XML format. This also includes the implementation of transformations, queries, etc. for the new XML data.

Core contents

As an example of the above, the project considers transformation and management of bibliographical data (i.e., information on books, journal articles, etc.), which is currently widely stored in the BibTeX format. The core of the project consists of the following tasks:
  1. Definition of a DTD or XML Schema for BibTeX data ("XML-bib" format).
  2. Transformation of BibTeX data to XML-bib (by a Java program).
  3. Implementation of simple queries on XML-bib in an XML query language.
  4. XSLT transformation of XML-bib to XHTML.
  5. Implementation of a web service based on the above queries and transformations.
  6. Implementation of a simple web interface using the web service and the transformation to XHTML.


The core project can be extended in many ways, and the groups are free to pursue the extension(s) they find most interesting. Some ideas for extensions are given below.


To pass the project, the group should solve the core tasks satisfactorily, and document its work in a report.

The report

The report may be in Danish or English. The maximum length of the report is 10 pages plus 5 pages per person in the group. There is also the possibility to add appendices, which will be read at the discretion of the examiners to clarify things in the report. The report must be a self-contained description of the project and the group's solutions, and describe the decisions made and the thoughts behind them. The descriptions, motivations, etc. in the report will be the main parameter in the evaluation of the project. The report must contain: Any extensions made (and their documentation in the report) may increase the grade.


The deadline for handing in the project report and programs is December 20, 2002.


There will be an oral group exam with external censorship in January 2003. The project participants will be individually graded according to the 13-scale.

The exam will take place January 21 and the censor is Torben Bach Pedersen from Aalborg. Where the exam will take place will be announced later.

Time Group
9.10-9.40 Lars
9.40-10.10 Michael
10.10-10.40 Qi
10.50-11.20 Mads Peter
11.20-11.50 Kenneth
12.00-12.30 Bjarke
13.30-14.30 Niels, Mohammed
14.40-15.40 Kasper, Jacob
15.50-17.50 Koffi, Thomas, Janus, Thor


The students should have experience corresponding to the Introductory Programming and Web Programming courses. In particular, they should know java, and have experience with XML, XSLT, and dynamic web pages.


Groups should have 3 or 4 members.


The teaching will be in the form of 4 lectures of about 1 hour each and 3 individual meetings with each group (up to 45 minutes per group).

Group meeting schedule:

Time Activity Place
Nov 18, Monday, 12:00 Signing of project agreements 1.45
Nov 25, Monday, 14:00 Introduction 1.03
Nov 27, Wednesday, 10:00 Lecture on parser and schema 1.03
Nov 29, Friday, 10:00 Lecture on queries and XSLT 1.03
Nov 29, Friday, 15:00 Hand in project report fragments (in pdf or ps) for Tuesday meeting e-mail
Dec 2, Monday, 10:00 Lecture on web services and web interface 1.03
Dec. 3, Tuesday, (10-15) 1st individual group meeting 1.03
Dec. 6, Friday, 15:00 Hand in project report fragments (in pdf or ps) for Tuesday meeting e-mail
Dec. 10, Tuesday (10-15) 2nd individual group meeting 1.03
Dec. 13, Friday, 15:00 Hand in project report fragments (in pdf or ps) for Tuesday meeting e-mail
Dec. 17, Tuesday (10-15) 3rd individual group meeting 1.03
Dec 20, Friday, 12:00 Deadline for hand-in Student administration office

Useful links

Web programming course
XML course (Data on the web)
BibTeX format
XML tutorial
Google WSDL interface
Amazon WSDL interface


Rasmus Pagh
Office: 1.09
Phone: 38 16 89 34
Anna Ístlin
Office: 1.09
Phone: 38 16 88 21

Instructor Rasmus J°rgensen will be helping out with the project. He will answer questions on e-mail ( and on the newsgroup. He will also be available in room 3.15 from 9 to 14 on the following days: