An Overview of ArcWeb

by Stewart Brodie

This article was going to be about manipulating Draw diagrams, in response to Paul's request in the November 1994 CAUGers. However, since Graham Jones is doing that, Paul asked if I wouldn't mind giving an overview of my current project: ArcWeb, a World Wide Web browser. I'm not going to explain the details of the Web here, although a brief overview is required.

The World Wide Web, or WWW for short, is a collection of documents, scattered around the Internet. Many of these documents are in HTML (HyperText Markup Language) format, which is plain ASCII text where some characters ("markup") have special meaning. For example: <em>hello</em> is an instruction that the "hello" should be emphasized. There are many other "tags" to produce special effects such as headings, paragraph breaks, bold, italics etc. However, the anchor tag <a> can contain an instruction that the following text is a link to another document: it is a hypertext link (hyperlink). ArcWeb is a tool which displays HTML documents by converting the raw HTML into a Draw diagram which can then be displayed to the user, saved in a common data format and easily printed (by Draw itself). When ArcWeb is displaying the page, you can click on hyperlinks and the page which the link refers to will appear.

So how does it do that? As soon as an HTML file is ready to be displayed, it is first parsed into an intermediate form which the rendering routines can easily decode and translate into instructions for creating Draw objects. I chose to wrote my own parser instead of using lex and yacc because HTML is context-sensitive.

I shall leave the non-trivial details of parser design for now and move on to the Draw object creation. Parsing results in instructions to create five different types of Draw object. Although RISC_OSLib contains functions for all the object manipulation, I chose to add objects "by hand" since it is much faster. However, I use the types declared in the RISC_OSLib header files, because these are useful. I use Acorn's new DrawFile module instead of the RISC_OSLib code to plot the Draw diagram because the module code is faster.

The first thing you need to do is to create an empty diagram. Use malloc() to create a draw_diag structure and then flex_alloc() to grab memory for the actual data. It is important to use malloc() for the draw_diag because flex requires that the pointer in it does not move. Using the flex memory functions means that, as pages are built and destroyed, the Wimpslot automatically increases and decreases. As each object is added to the diagram, use flex_extend() to increase the memory available, then put the object data straight into the diagram and update the diagram bounding box directly. There are more efficient ways of managing the memory, such as allocating it in larger chunks to save on calls to flex_extend(). This is important when there are many calls to flex_extend() since flex may have to reorganise its memory on each call which involves copying chunks of memory around.

Font Table Object

The font table object should always be the first object in your Draw file. The PRMs say that it needs to precede all text objects but, if it is the first, then you won't have a problem. (There is also a mistake in the PRMs which say that each entry in the font table should be padded with zeroes to the next word boundary. This is not the case). I have 20 or so fonts in the header since there are many different text styles that may be used in a document: six heading styles and then body text, hypertext and fixed-width text all with plain, bold, italic and bold italic styles, plus one or two others. These fonts do not need to be different: normally body text and hypertext are the same font and it is the colours which are different.

Text Objects

The parser will not split up the text into individual words since this would mean larger Draw files and slower rendering (more calls to Font_Paint). Yet a block of text may not fit onto the rest of the current line. Some words might fit, maybe none, or maybe all of them. The text creator uses SWI Font_ScanString to find out where to split the string, if anywhere. It is told that spaces are the split characters and how much space is left on the current line. Once that has been decided, the text object is added to the diagram with the correct bounding box (returned by SWI Font_ScanString) and the rest of the string is then handled the same way. If no more objects fit on the current line then a new line is started. If the text is hypertext then its colour is changed to blue and it is underlined (see Path Objects below).

Transformed Text Objects

These are used when tags indicating scaled text are found. The objects are handled in exactly the same way as normal text except for the 7 extra words in the Draw object header, which contain a transformation matrix and a flag word.

Sprite Objects

HTML allows the inclusion of inlined images. Unfortunately, most of the images on the Web are in GIF or JPEG format. Currently, Draw files only allow RISC OS sprites to be embedded in them, so images need to be converted from the original format into sprites. ArcWeb runs ChangeFSI to convert any images it finds into sprites. Once a sprite is available, the sprite object is added just like a text object: a check for there being enough space left on the line etc.

Path Objects

HTML has a special tag <hr> which stands for Horizontal Rule. ArcWeb generates a line across the page using a simple path object made of three components: move, line to and end. To speed things up, when the program starts, it generates two composite Draw objects (a line and a box) which it can add to a diagram when needed. These are initialised with all the correct headers and the correct path commands. The line and box routines need only fill in the bounding box, coordinates and the colours.

Building the Diagram in Real Time

The process of converting the HTML into a Draw diagram can be time consuming for anything but a small document so it is desirable that you can see the early parts of the diagram while the rest is still being created. This creates many problems, particularly that the integrity of the Draw diagram data must be guaranteed when an attempt at plotting the data is made. I maintain a set of offsets into the Draw diagram including the current "safe limit" which I pass as the diagram size to SWI DrawFile_Render.

As each line of the diagram is built, wimp_update_wind() is used to draw the current line on the screen. Every so often, the renderer calls event_process() to allow the desktop to remain usable. It keeps processing events until it receives a Null event code, so the diagram remains up to date as the user scrolls up and down the document, and the system remains as responsive as possible. Because the document will not contain any overlapping objects and drawing is clipped to the current redraw rectangle, redraws are fast and there is little to be gained from multitasking redraws.


From CAUGers volume 2 issue 3       Comments to caug@accu.org