erys: resume: netscape: news

rysmccusker at yahoo dot com
cell: (415) 215-1797

David (Rys) McCusker
home: (415) 552-3810

resume, rys, tech,

news written at Netscape.

index, selfmade, mtnlake, apple:pink, apple:taligent, cats, apple:opendoc, netscape, pivia, osaf, pivia:temp, paypal, akamai,

news

Here's a selection of news group postings written at Netscape. I assembled these while finding news related to IronDoc. There's some overlap between this and the IronDoc news page, when content is about IronDoc. But this page contains a few additional entries.

From the beginning, folks on my team were expected to field public newsgroup questions about technology impacting users. At one point, Netscape's mail/news engineers were told to conduct development work publically in newsgroups as much as possible. Most of my newsgroup writing at Netscape was in this vein. We were trying to maximize public transparency in our design and coding process, because this was considered a high priority in the open source business.

embedding

The original copy of the following message is located at msg02081.html.


Subject: Re: [dist-obj] XML coding syntaxes (was Re: REBOL for Internet Comm) 

From: davidmc@netscape.com (David McCusker) 

Date: Fri, 19 Nov 1999

Eugene Kuznetsov wrote: [ snip ]
> [SMTP:davidmc@netscape.com] wrote:
> > There's no reason why one could not embed XML in Java syntax, except
> > that a Java environment vendor might see no good reason to make that
> JSP (java server pages) basically work that way, but there is a
> preprocessor that separates out the java source and the HTML source.

This differs from my intention mainly in degree of integration. Early separation by preprocessor implies only a very loose and informal state of association through embedding. I'm more intrigued by situations in which the embedding remains through all transformations until runtime.

Other ways of embedding like this are interesting too, but they are hard to distinguish from other ways of storing stuff around in the environment for later access at runtime. The most interesting case seems to happen when the embedding is still there at runtime. (And I realize this is just a matter of binding at runtime -- that embedded content moved to another location earlier could still be accessed at the original point of embedding at runtime through a reference that resolves suitably. This is also incredibly hard to dinstinguish from what a compiler does if a literal data object is stored in a literal table, leaving a ref in code.)

Total integration is the most interesting, and this is what I'm doing in Mithril. This means embedded XML is parsed at compile time to create standard abstract syntax trees that either get turned into code, or else remain at first class data objects when referenced as literal data. So the embedded content is not text, in particular, except that much of XML content is text when it is not tags. This kind of integration implies a deep and thorough native support for XML structures in the runtime. In this way, embedded XML is just as real and first class as other content.

Note that when I say 'compile-time' I am actually being ambiguous since I also plan to compile dynamically at runtime when it suits me; I like my mechanisms set up so I can choose exactly when things happen, so the cost elements get factored appropriately for my application, and not by the whims dictated by a development system.

> It is not clear to me that this is a win, but many who actually use
> them on a daily basis are very happy.

This is the basis of most essentially meaningless arguments on the net, when folks jockey and position each other about whether a win results from a particular tech approach. I think it's meaningless because it should be the job of mechanisms to make things possible, and in that context winning means nothing since you still need to apply in context.

Actually applying mechanisms in usage brings policy into the picture, and this is when winning matters, and it has little to do with whether it is possible to do something. It only matters that the policy is clear, efficient, elegant, cost effective, etc compared to other things that one might do in the same context. Happy users is a good sign.

> [SMTP:davidmc@netscape.com] wrote:
> > have a sensible meaning. The stance of "code is code, data is data,
> > and never the two shall meet" is an arbitrary usage convention, and
> One could argue that for distributed systems specifically, C++ (and
> related technologies) actually err on the side of not separating data
> and code enough.

I strongly agree to the extent that all C++ frameworks I've seen are heavily code centric, to extent that all data interpretation is held hostage to serious code versioning risk. This was usually done under the aegis of being object oriented, which gives license to do whatever one pleases in bad entropy design, as long as it's nicely polymorphic.

The common pattern for serializing objects into byte streams is a case in point that makes me feel really queasy ever time I see it. Often the byte stream has no well-defined format, since the code 'knows how to read and write itself', and problems with versioning are dealt with in some ad hoc fashion. But this makes it hard to share this data with any other piece of code, so such formats are very object proprietary.

(I know at least one person who will assume I am talking about them personally above, so I am forced to add a disclaimer to cope with any potential paranoia. I first saw this pattern abused in the Taligent architecture for serializing content into streams, and I generalize my perceptions now to include other common practice; I am not, repeat not, speaking about any single individual practitioner, now or ever.)

However, this practice is not really inherent in the nature of coding C++ itself, and reflects more the common belief systems of developers.

That such belief systems are wrong (in the sense of generating systems that don't hold up very well) is manifestly obvious when you look at the history of massive efforts that went into C++ development projects, and see how easily the space was invaded by internet content standards.

I'm digressing now, so I'll get back on track after this next quote:

> It replaces data (e.g. a->b = x->y;) with code execution
> (e.g. a->SetB(x->GetY())) or intermixes the two.

You almost lose me here, and I have trouble seeing the relation to the point I intended about separation of code and data in C++. So I'll just say more explicitly what I mean about C++ and code vs data.

Typically code lives as data in the operating system environment, and gets loaded to become code at runtime. Normally the file system is the repository in which code exists as data. Source code also exists as data in the environment, and it gets turned into manufactured coding goods by a development system within a very well-defined context we normally call compile-time or development-time, which is very strongly distinguished from runtime or execution-time.

For most C++ environments, data can become code (source gets compiled) only in a very limited context called compile-time, which never happens at runtime. This is basically mechanism dictating to runtime policy, which is a technical crock. Further, this transformation to code is also typically very monolithic and deeply fraught with fragile risks, and consumes vast amounts of computing resources in repeating patterns, even when very little change might occur between each batch process. And when we do it wrong, apps tend to crash hard without forgiveness.

Is that bizarre, or what? We put so much time into enabling end users (who we also like to deny the joy of programming :-) and so little effort into making development environments more robust, facile, and cost effective. We should think about that more, and consider changes where appropriate; I'll skip all the many reasons for current norms.

So that part above mentions how hard it is to move from data to code in C++ at runtime; this is one of the arbitrary separations I mean.

The other separations are those that involve embedding, either code in data or data in code. Of those two, embedding code in data is the most constrained under typical operating conditions. Form factors permitted for code library format are both arcane and fragile, and both these have a strong effect on total development project costs. Why should it matter exactly where compiled C++ code is stored? I can design a system that lets me bind to code wherever I want to put it.

Yes, security is a concern, but security should happen as policy and not as mechanism. I hope this short-circuits a lengthy subthread. :-) (Crippling form factors for code in the name of security is a bit like physically disabling the citizeny of a human population so they cannot walk, since it's so durned awkward when criminals run from police; in both cases we throw away potential, to guard against a narrow context.)

I could also add some material here describing how it is harder to embed data in code under C++, but this case is maybe the least of the problems, and I've blabbbered quite a bit more than I should already.

> This isolates the program from storage details, but it does solve the
> interoperability problem at the bits-n-bytes level. Unlike the JSP
> example, it's true of both devtime and compiletime.

The snippet of C++ code you gave illustrating assignment only isolates the way in which bindings normally operate under C++. That the old way used a storage centric approach for binding is not quite as big a matter as the underlying issue of how binding is organized and how this affects fragility or runtime costs. When we talk about treating local and remote objects differently, we are discussing how bindings either hide costs or make us accept higher risk of uncontrollable failures.

> This quickly devolves into the earlier discussion about treating
> distributed and local objects different, but that's not nearly as
> interesting as the code/data separation question. With XML/XSLT and
> the push for XML scripting languages, this is increasingly important --
> especially as XML is being positioned to replace many clunky, but
> "tried and true" business-to-business hookups.

That's a very interesting context for further discussion, and I think we'll make more progress if we use a smaller granularity of reasoning about how things interact, than if we tell developers or businesses they must choose between Java or XML (my way or the highway!).

We need to talk about primitive concepts regarding code and data, and what it means to mix them in certain patterns. Because if we don't agree on some basic rules in such a low level context, then there is no way an argument about higher level systems will ever be about facts that can be weighed against each other in meaningful ways.

David Mc

incremental

From: David McCusker <davidmc@netscape.com> Subject: [mork] why incremental writing works Date: 02 Jun 1999 00:00:00 GMT Message-ID: <3755D60D.A6EF0A9D@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news If you like technical Mork details, this might get interesting down a few paragraphs starting with "But first, why does this work at all?". This is an introduction to two more articles I plan to post on writing updates to Mork incrementally. One post will describe the heuristic approach I plan to take in Mork for switching tactics from <recording changes for writing only diffs> for update efficiency, to <just write the entire changed object so recording changes will not take too much space> to avoid worst case space usage effects. The reason for that post on heuristic tactics will be to explain why I suddenly am no longer trying to do the best possible thing in Mork in all cases, in favor of just getting something good enough done sooner. (I'm not being as productive as I should be because I'm bored to tears with struggling over the same issues on a text solution that's basically a kludge compared to a real binary db solution that does things right.) The second post will describe how I do the ideal thing in public domain IronDoc for incremental update, so the contrast with Mork will help to explain why Mork is the way it is, and how far this is from the ideal case. This serves the purpose of characterizing the efficiency of Mork more than it serves the purpose of advertising (yet vaporware) IronDoc. Basically any text-based db that keeps everything in memory necessarily has characteristic problems with incremental updates and balancing the issues of memory footprint, commit latency, and worst case scenarios. (Note I'm not cribbing from any db research; this is all just obvious to me, and I expect to anyone else as well who thinks about this stuff.) Now, the part I'm giving up on, which I'll explain further in the next post, is keeping the cost function continuous for incremental updates. The heuristic approach I mention will have a discontinuity where the strategy changes, and this might make the code rather less complex. The explanation of the IronDoc strategy will show how cost is easier to do continuously in random access binary formats, and it might seem clear to some folks why the cost structure described will represent the best case scenario a db system might target. But first, why does this work at all? The rest of this post explains why Mork's syntax and semantics is such that incremental updating will occur naturally, and documents a couple syntactic changes I plan for clearly representing content removal. The key idea is that Mork syntax is geared toward appending content, and that semantics of parsing Mork text will favor this interpretation. Nothing in the syntax implies when an entity starts or stops existing (except maybe for transaction groups, which are not API visible). Merely mentioning an entity causes it to exist if it did not already, and content listed is always considered additions to existing content, where existing content happens to be the empty set when an entity is first mentioned. Only tables and rows have identity, so these are the only objects that spring into existence when their IDs are mentioned. A Mork text file contains any sequence of tables and rows, and the same tables and rows can appear as many times as desired. Putting a row in a table that has not already been added causes it to be appended, and putting a cell in a row causes it to change that cell in the row. In other words, bindings for table and row IDs are created as soon as they are mentioned. And these bindings last until tables and rows are garbage collected during a compress commit. (A row can be collected when it appears in no table or cell, and a table can be collected when it appears in no cell and holds no member rows.) There are no forward refs, because just mentioning a new table or row will create one. So there are really only two major problems in structuring Mork text to support atomic transactions that cut or add db content. How do we remove existing content? And how do we make a transaction atomic, so it must be applied either completely or not at all while parsing? New syntax uses a minus '-' to indicate a member element that should be cut from the container. So '-' in front of a row will cut the row from a table, and '-' in front of a cell will cut the cell from a row. This is enough to efficiently show small changes involving deletes. To efficiently show wholesale changes in a container entity, we want a syntactic form which clears all old content so we can start over. So new syntax will also use a minus '-' at the start of a table or row to cause all old member elements to be deleted from the table or row. For example, [1EDA:^80 -(^88)] causes row 1EDA:^80 to cut only column ^80 from the row, but [-1EDA:^80] causes all columns to be deleted. So [-1EDA:^80 (^8A^8B)] will discard all old columns in 1EDA:^80 and then append exactly one new cell with column ^8A and value ^8B. A more complex example involves updating both tables and the rows inside at the same time. So {-1:^80 {(k^94:c)} 0 1EDA - [-1EDA (^8A^8B)]} has all the following effects, in this order: - table 1:^80 cuts all rows - table 1:^80 adds row 0:^80 as a member - table 1:^80 adds row 1EDA:^80 as a member - table 1:^80 cuts row 1EDA:^80 from membership - row 1EDA:^80 cuts all cells - row 1EDA:^80 adds cell (^8A^8B) Because rows in tables are ordered, another kind of change that can be performed involves moving a row's position without otherwise making any change in table membership or content. So we need one more new kind of syntax to efficiently show movement of row position for small changes in large tables, since listing a row as a table member causes an append to the current end of the table's row array. So we use a new suffix notation following a row in a table to specify a different position than the one that would be achieved by default. For this, we use a bang '!' followed by a row position to say where a row should be put. For example, {1:^80 {(k^94:c)} 0!7 } says that table 1:^80 should add row 0:^80 as a member at row position 0x7 rather than the current length of the row array. Because we don't like to specify syntax that can have runtime errors due to impossibility of implementation, we will say that specifying a row position beyond the current end of table will cause appending to end of table. So it will not be possible to anticipate a table size that has not yet been reached with overly large positions. The answer to the problem of atomic transactions is handled by existing syntax that will be checked in soon for the first time. A transaction is some sequence of tables and rows enclosed in begin and end delimitors for a transaction, where these delimitors were chosen partly to look peculiar and partly to use syntax which cannot legally appear inside a row's cell without violating quoting rules Mork uses for '$' bytes. Each transaction also uses a transaction group ID number, mostly for an additional degree of redundant checking that affects the final result of content not at all. One could use the same group ID for every file transaction (though that would be silly), so the examples below will always uses FEEDFACE (i.e. in hex) as the canonical transaction ID. By enclosing a sequence of changes to tables and rows within a begin delimitor of "@$${FEEDFACE{" and an end delimitor of "@$$}FEEDFACE}", all the changes in between will be applied atomically, typically through the simple expedient of parsing to the matching end delimitor and then reseeking the start to parse tables and rows only when well-formed. A transaction is terminated and aborted by one of three conditions: end of file, the start of a new transaction, or an explicit "@$$}~~}" end delimitor that means abort (because app changed its mind apparently). An aborted transaction will just ignore all the content appearing in that Mork transaction. We don't want to apply updates for only part of a transaction when this might cause app level data inconsistencies. If we didn't need changes to go together, we wouldn't bother putting them all in the same transaction. There is a very specific reason why we abort a transaction when a new one starts with a new opening delimitor. We expect this situation to be the normal case when the app crashes while writing a transaction during an earlier session. If the app crashes, no end delimitor is written to commit the last transaction. If we need not remember whether a file ended with an aborted transaction, we can just start appending new ones in further transaction updates. So incremental writing in Mork will work well because we need only start a new transaction, and start writing only the tables and rows that have been changed since we were last consistent with the db image. And we need only write those parts of a table or row that were changed, and this matters most when an object is large and writing only diffs will be much more space efficient. A crash before a transaction completes will cause the partially written transaction to be ignored in parsing. David McCusker, mild mannered binding monger Values have meaning only against the context of a set of relationships.

writing

From: David McCusker <davidmc@netscape.com> Subject: [mork] writing costs and incremental heuristics Date: 03 Jun 1999 00:00:00 GMT Message-ID: <3757357C.AF65AFD1@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news (Got to actually post this today before I forget... :-) Okay, here's the first of two planned posts I mentioned yesterday. This one describes how I plan to handle diffs for incremental writes in Mork, and the next one will contrast Mork writing costs with the ideal approach used by vaporware public domain IronDoc. I only care about issues involving costs, which I'll explain without going into algorithmic analysis in depth. There are more costs to be minimized than folks sometimes consider. In addition to space and time complexity costs, there are also code and risk complexity costs. Not only do we want the footprint to be small and the code to run fast, but we also want to code to be simple and to resist failure, noting that simplicity can also help resist failure when fewer bugs are written. Before I say how any of that matters in Mork, first let me describe how the incremental writing must be organized, since this will provide the context for making comments about the costs involved. The point of incremental writing is to write less than all the content, specifically in order to minimize time latency to commit changes that a user makes in the content. In other words, saving should be fast so the user doesn't wait longer than necessary. We want cost of saving to be proportional to amount of content changed, and not proportional to size of all the older content which does not change. So the basic strategy of incremental writing involves knowing what has changed, so that commits work with the known changes rather than all known content. Then writing changes is proportional only to size of changes, and this reduces cost to commit an individual transaction. But this kind of pay-little-now cost incurs a pay-more-later cost which must be paid eventually when wasted file space exceeds a tolerable level and requires a compression. Wastage occurs when incremental updates are changes to old content, so that earlier portions of the file hold bytes that get replaced later on, so that the earlier bytes are meaningless. Compression is necessarily a maximum time latency effect, but at least it's better than writing all the content for every commit. Basically a compression is just the same as writing all the content once, cleanly. We pay full cost periodically instead of always when we use incremental writes to update a file during a commit. Another less obvious cost occurs when a file is opened and parses, since typically more file bytes must be parsed to apply appended transactions than would be found in a freshly written file. And applying changes to earlier invalid content is extra work compared to a clean start. Okay, that's most of the introductory material. What's the problem? Since Mork keeps all content in memory after parsing a file, our main problem is excessive memory footprint, because keeping track of changes will tend to use some space in order to make the process of writing only changes go a bit faster. (As opposed to not keeping track of changes at all, and comparing all content to the old image of the file to make a diff, which could be slower than just writing all the content anyway.) Controlling memory footprint to track changes can be hard. For example, when I used to work on Bento at Apple, I determined that much of the memory footprint was caused by the way in which data structures were designed so they were prepared to record changes, even if no changes actually happened at runtime. One can try to avoid this effect, but there needs to be at least slightly more footprint to be able to refer to changes even when they don't exist. Now, when trying to avoid memory use, it is tempting to make the code more complex so that minimal memory is used to incrementally track each and every change, so that footprint increases continuously but minimally in proportion to the changes. The reason for complexity is partly that some changes require more space to capture than others (a move needs more info than a simple add), and partly because using minimal space is more complex than just collecting changes as abstract object instances. But at some point, as the number of changes begins to approximate a big proportion of all content, it becomes cheaper to just write all content and not just changes. And in that case the strategy of recording any changes just loses outright in terms of both space and time, since the effort spent recording was not actually useful as a labor savings. So you want to quit recording early before losing, if you can guess where. But if you give up too early, there's a larger discontinuity in costs. Okay, all the material above provides the context I need to summarize my intention to make Mork code less complex and more discontinuous in cost as a function of content change. I plan to be lazy and use more than minimal space to record each change, and I plan to give up early when changes seem large compared to total size of some collection being changed. Using more than minimal space involves putting changes in a special location rather than mixing the change notations in with the actual content (applying odd space bytes as available along with more complex code to interpret the various idiosyncratic change encodings). And mixing miscellaneous changes in a more generic collection of changes will use more space to disambiguate change types, and to provide space for the change collection. Giving up early involves picking some minimal content sizes for rows and tables before it seems worthwhile to record changes at all, and also picking some percentage of row and table content over which it seems simpler and/or cheaper to write all content in a row or table. Note that "giving up early" happens on a row by row, and table by table basis, rather than to the db file content as a whole. So the granularity is better than might be implied by a strategy to give up. This all sounds like a fine strategy, but one has to worry about the impact when db content is scaled up to very, very large, and whether the strategy makes it possible for any unpleasant worst cases to occur. What can go wrong? I can use too much footprint before giving up early if the collection is very, very large. And I can give up a little too early on a very large collection, and write the whole thing when I should have recorded one or a few more changes. David McCusker, mild mannered software reporter Values have meaning only against the context of a set of relationships.

remork

From: David McCusker <davidmc@netscape.com> Subject: Re: mork questions (part deux) Date: 03 Aug 1999 00:00:00 GMT Message-ID: <37A7720E.CEC3E4EF@netscape.com> Content-Transfer-Encoding: 7bit To: Chris Waterson <waterson@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news David Bienvenu wrote: > There is no meta information for a table, at least not in the schema > sense. Yes, both MDB and Mork are schema-less, which means they don't care about which or how many column attributes appear in anything, and tend not to use space for things that don't appear. It should be sparse, and a table is logically a sparse matrix. (Public domain IronDoc uses the term 'matrix' instead of 'table' -- same thing.) > Each row can have a completely arbitrary set of columns, which is > great for a number of reasons - you don't need to worry about rarely > used columns wasting space, and you can add columns without worrying > about forwards and backwards compatibility (old client code can modify > db's with columns it doesn't know about without losing the columns). Yes, the only time you get in trouble is when you assume you know all the columns in use, and do something that would step on somebody else. Suppose you write client C that knows about x different attributes, so it always writes a row as Cx. Then later the next version D of this client knows about y different attributes, and writes a row as Dy. Then going forward from Cx to Dy will tend to add the missing columns. But going from Dy to Cx will lop off the extra columns, and fubar the D client when you go back, if client C assumes it knows everything. The most friendly thing to do for compatibility in all directions, is to leave columns alone when you don't recognize them. That way you can avoid stepping on your future client's attributes. (So if someday a guy looking just like you shows up at your front door, don't gun him down, because maybe it's you in the future using a time machine. :-) > Or, put another way, the schema is defined and managed by the client > code, not the MDB/Mork code. Yes, it is a shift of responsibility to clients, away from the db. This gives freedom, and maybe creates more work simultaneously, and the extra work can be distressing to folks who want totally magic db's. But the freedom is a blessing to folks who prefer clutches and want to shift any gears manually to best possible effect for specific apps and versions. > I'm not sure about the difference between columns and cells. The difference is defined by the matrix metaphor. A column is a vertical section of a matrix, whether or not the column has a value in any row. So a column is just the name of that attribute. But a cell is a specific value in a specific row that occurs in a column. In the context of a row, a cell and column are very similar, except that rows can theoretically contain all columns, but only represent the non-empty columns as cells. A column is like the name of a variable in a programming language, and a cell is like the slot on the stack which is able to hold a value. There is subtle difference, in that a variable or column just refers to the space for a slot or cell. > What you were doing may be perfectly legal, but since I'm not doing > it that way, it may not have been > tested. AddColumn does seem easier, since you can do it in one call. Adding is generally simpler than setting, but I included setting in the original API because it made sense for completeness. But I only coded methods when actual usage of them was desired. (Folks suggested I take this approach so that I did not gratuitously write more code than was actually going to be used, just for the sake of niceness. The implicit assumption is that I will write too much code if given the opportunity, so need must be proven by demand. This might not be an entirely rational system, but hey, what can you do. Saying "just write the code that will be used" is a lot like saying to make money in the stock market "just buy low and sell high".) > > 2a. What is a "scope"? If rows live in a "row scope", and not in a > > table, then what utility does a table provide? (I believe the > > utilitarian answer to this question is, "it roots the row for the > > garbage collector".) > > A table is just a set of rows, and this can have lots of applications. Yes, a row is a collection of cell attributes, and a table is any set of rows. So any way will work in which you can think to collect rows into sets to model some pattern of content. Just imagine any UI which displays your content; if you can imagine a view that shows rows of 'things' with attributes displayed in columns, then you can model that view as a table, and the 'things' as rows. If your UI has lots of different kinds of views of the same rows, but organized in different ways, then you can go ahead and put those rows into different tables, so each table can back the intended view wanted. I expect mozilla apps to often follow this model closely, so view=table and line=row. For example, in address books, list=table and entry=row. So the direct utility of a table is that it collects and orders rows. > For example, in summary files, there's the table which contains all > message headers. This reflects a need to represent all headers as a row collection. > Each thread is also a table, a table of the message headers in the > thread. This reflects a need to show a thread as subset of all header rows. > Since the message headers are already in the table of all message > header, the thread tables just consist of references to the rows > in the table of all message headers (I'm using "reference" very > loosely - MDB defines some sort of reference cell, but I'm not using > those, and I'm not sure if they're implemented fully yet). When a table has a row as a member, it is always just a reference to a row which is logically owned by the stored, but scoped in the name space of the 'row scope'. Any table can ref any row, even when a row lives in an entirely different name space than the one the table lives in. So tables can hold completely heterogenous content systems. However, the most efficient internal representation probably results when tables hold rows in the same scope. In Mork for example, a ref to a row need not include the scope qualifier when it matches the scope of the table, and this reduces the per row disk footprint. Currently, only a table can ref a row. My blueprints also call for a cell to be able to ref a row, but I have not coded it yet. It's not really hard, but I still have not found the chance to squeeze it in between cracks in other tasks. It seems I should code this without demand, since it seems highly likely developers will find some other (less effective) ways to solve problems to ref rows when calling those methods causes a 'stub-only' assertion. Incidentally, a row cannot be self-rooting until I implement this feature (see "MDB] no gc for rows in cyclic refs?" 09 Jun 1999, at news://news.mozilla.org/375EFE32.3207DD2C@netscape.com.) > Or, take an Address Book example. There's the table of all the people > in your address book. Each mailing list would also be a table. Yes, exactly. > > 2b. What significance do the strings that are used to declare the > > "row scope" and the "table kind" have? I copied bienvenu's patterns > > and ended up with this: A "table kind" is a secondary brand of metainfo annotation, second in importance to the row scope (which defines the ID space for a table). In practice, table kinds are only useful for helping filter out the table kinds one is not interested in seeing, assuming one is iterating over tables in a store. But many apps can do without iterating over tables, because I added API to give tables absolute IDs know a priori by an app, so searching for a table by kind may not ever be necessary. So a table is a collection of rows. The choice of table scope is intended to most closely match the scope of rows usually placed inside a table, so most efficient implementation encoding can ensue. The choice of table kind is intended to indicate what kind of collection of rows is inside the table; since rows can appear in more than one table, what is so special about this particular collection of rows? So table kind is actually "poor man's table schema" which apps can use to cooperatively decide what purpose is served by specific tables, but this is entirely a client-side behavior, and MDB and Mork ingore kinds beyond keeping track of kinds and maybe using them in iterations. Everything would work just about as well if you always used exactly the same table kind for every table in every MDB application. You could tokenize the string "table:kind:I:don't:care", and use this everywhere. The only problem with this is that it throws away an opportunity to mark your tables with some indication of purpose. If you choose table kinds carefully, maybe folks looking at your Mork text files will send you fewer emails asking, what's the purpose of this one giant table in history, and this smaller table over here? > > err = mStore->StringToToken(mEnv, > > "ns:history:db:row:scope:history:all", &kToken;_HistoryRowScope); > > err = mStore->StringToToken(mEnv, "ns:history:db:table:kind:history", > > &kToken;_HistoryKind); > > > > What on earth did I _really_ do? :-) > > OK, I've dodged the questions about scope as long as I can. > I don't think you or I care much about scopes - we just create one > and use it. Right, you don't have to care about scopes. Since I force you to pick one, your odds of accidentally colliding with, say, the address book row scope, are not very large. This would matter if Mork content from different apps were put into the same db file. You can do that, but whether one wants to do that is never clear without some analysis. However, if you discover you can't manage to avoid choosing IDs that already are in use, and you find yourself at Wit's End (somewhere in the misty caverns), then you can use scopes to resolve conflicts. [The best explanation is in "[MDB] avoiding ID collisions", 09Jun1999 at news://news.mozilla.org/375ED703.4E933B78@netscape.com, and it is very important that you read this posting fairly closely.] > Each table can contain rows of different scopes. Yep, but cheapest persistent encoding might come from scope uniformity. > When you create a table, you tell it the default row > scope for that table. When an MDB implementation writes out the rows > in the that table, it doesn't need to write out the scope of the rows > that are in the default scope. Yep, it's just an encoding compression strategy. > You might be able to imagine reasons to have rows in different scopes > - I haven't found a need for them. Similarly with table kinds... You are most likely to need more than one scope if you have significant amounts of metainfo, which you want to associate with rows in some scope. You might find the simplest strategy is to use a different scope for rows composed entirely of metainfo attributes, and give these rows exactly the same IDs as the rows they annotate elsewhere. This effectively happens with threads in mailnews, where David B uses an ID for a thread table the same as the ID of the first row in a thread. This works in just one scope only because MDB distinguishes between IDs for tables and rows, because every row scope name space is really a pair of spaces, with one for tables and one for rows. But if we had needed one more kind of metainfo besides thread tables, then we'd be out of free places to assign parallel IDs, without using a new scope. You don't really need table kinds at all; it is mostly window dressing to label how things are being used. Maybe if you pick good names, this could reduce the amount of future technical support for history db's. Just think about the effects of name choices for mozilla newsgroups. David Mc

datastore

From: David McCusker <davidmc@netscape.com> Subject: Re: RDF Documentation - cleanup and plea Date: 11 Aug 1999 00:00:00 GMT Message-ID: <37B1C885.8B015E83@netscape.com> Content-Transfer-Encoding: 7bit References: <Pine.GHP.4.02A.9908110755510.29455-100000@mail.ilrt.bris.ac.uk> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Dan Brickley wrote: [ lots of cool stuff snipped ] > Anyway, main thing to say is: more docs please! I know for a fact > you folks are doing more with this RDF stuff than the /rdf/doc/ > pages let on. Yes, please, more docs. I volunteer to ask stupid questions, if that will help folks write docs. It is easier to write docs that answer questions, than it is to shout statements into the blue. I do not profess to know much about RDF, and I'm not bothered at all by whether questions I ask imply I know less than seems studly. But if you give me permission to flood you with questions, then you'll have to tell me when to stop. :-) Because I won't stop thinking of questions to ask, and eventually I'll start hitting the edges of stuff that never occurred to you (at least in the terms I might use to ask). Questions are very, very good things. They form the basis of much of my own reasoning, since I proceed mainly by asking questions and then answering them myself. Since answers occur to me readily, my limits in progress are caused more by lack of questions; I assume other folks have similar experience without realizing this. I can never seem to get all the relevant questions into docs I write; expository style is limiting, and I plan to switch to dialog style soon, so I can have various fictional characters ask questions that would be confusing if a single voice asked them all. Hey, I hope the documentation folks will consider writing some dialogs. Here's my question for Dan today, which I might ask again in a different thread: where should I look for some RDF APIs that would guide me in later designing a layer on top of public domain IronDoc specifically for the purpose of creating a clean RDF triple store? (No, I don't suppose that TripleDB conveys what I want to know.) I don't know when I'll start working on IronDoc on my own time again, but when I do, it sounds fun to write a triple store interface. It would be about a week's worth of coding, unless arbitrary sized blobs are needed, which is not done yet in IronDoc. I consider such an API to be one answer to the question, how do you use IronDoc? Or the similar question, how do you use IronDoc to make an RDF triple store? > For example I've nothing on the mail/news work -- is that still > using the RDF APIs? I don't know. I never look at the code above the MDB interface for Mork which interacts with RDF. But I'm pretty sure that Mozilla RDF is the main bottleneck for interacting with mailnews, so there is a lot of RDF traffic forming the basis of the mailnews architecture. Does the question aim to ascertain whether other RDF implementations can be used instead? Or to ascertain the extent to which mailnews semantics can be described in terms of RDF data and operations? I want to know the questions to both of these myself, if someone here can answer them in fuzzy analytical terms (I expect no black or white answer, but only shades of gray going into specific details.) I would be surprised if a generic RDF implementation could be plugged in, if only because I would expect some complex dependencies on things in the mozilla code base which have an emergent and unplanned effect on how things work in practice. I'd love to hear some analysis. > Is Mork an RDF Datasource? Nope, not specifically. Mork under the MDB interface is used as the private implementation of persistent storage for mailnews and address books, in order to supply the db engine necessary to support the role of an RDF datasource. For example, a mail message database has RDF kinds of semantics, and these get mapped onto MDB/Mork at the point where serialization to disk is required. That brings to mind a peculiar circumstance whose implications are a bit hard to reason about. When an RDF datastore is represented as an RDF graph, this content is approximately a duplicate of content in the datastore, and a serialized XML graph of the RDF graph logically has all the same content as the original datastore. So why are the two versions of the content different from each other? (We'll ignore the possibility that a datastore does not actually reveal every bit of content to the RDF graph, and pretend that every bit of datastore content is faithfully captured by the RDF graph.) The original datastore and the serialized XML version of an RDF graph are essentially just two different formats, where each format is better suited than the other to satisfy some kinds of operations. The main reason to prefer a specialized format instead of a generic RDF graph, is to reap performance benefits accruing to formats that perform some kinds of operations more efficientl than another. For example, the file system used by an operating system can be shown to RDF as an RDF datasource, and all the file system's content can be stored in some suitable RDF properties. But the way a file system stores this information is probably a more efficient way to handle the files for the operating system, than using the RDF graph. (If this was not true, then every OS might start using RDF for file systems.) David Mc

html

From: David McCusker <davidmc@netscape.com> Subject: Re: PC/MAC exchange of Address Book, address book export to HTML? Date: 14 Aug 1998 00:00:00 GMT Message-ID: <35D49861.128038D6@netscape.com> Content-Transfer-Encoding: 7bit References: <35D45E6F.A0A7C5F9@bellatlantic.net> To: strings@bellatlantic.net Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.general G. Gilbert (strings@bellatlantic.net) wrote: > First, I'd like to be able to coordinate or exchange the address > books between a MAC and PC Netscape Version 4.x. Easy to do? > Please help. Because usage questions don't belong in this forum, I'll consider these questions from the perspective of what Mozilla address book developers might plan to do in Mozilla coding to achieve reasonable address book features. This might answer your questions as well. First I'd like to point out to Mozilla developers that users seem to consistently want synchronization of address books in as transparent a manner as can be achieved in all directions, between varying formats as well as multiple address books in different locations. This can be hard, since it's a "do what I mean" interface; 4.x focuses on import. To easily use the same address book file on different platforms, the format should be cross platform, which means any difference in integer endianess (big on Mac, little on PC) should not interfere. A text format is good to avoid the problem. Binary formats, like 4.0.x books, must either use some canonical byte order, or swap bytes as necessary. The 4.0.x binary format is supposed to be cross platform (using a canonical byte order) so I would try using the same address book file on both platforms. I have no direct evidence about whether this works, but I've never heard about problems, which I'd expect if they existed. The 4.5 binary format has the same status, including lack of evidence. To use an address book file from another platform, the application has to be willing to notice and/or use the file on the new platform. In 4.0.x, giving a file the expected name, which is typically abook.nab, should be enough to get Messenger to use the address book. The exact name is recorded in a preference file, and might not be abook.nab. When there is more than one address book, things are more complex. I would recommend that Mozilla developers try to discover what files are address books by inspection of metainformation when possible. The 4.5 approach currently keeps names of all address books in a pref file, so zapping the pref file loses the books, but they can still be imported. So when a user is otherwise stuck, they can always import addresses, but what affect this has depends on whether import only takes content that is new, or is willing to update existing content to synchronize. 4.0.x will only import html and ldif address books. 4.5 will import ldif, 4.0.x files, and other 4.5 binary files; later it will do html. (The windows client will also import other formats out of my sight.) The current import behavior (which might change) is to take address content that that does not conflict with an existing entry with the same email address. This now seems to be more unexpected to users than updating existing entries, so this policy might reverse, with the addition of a new javascript pref to control how import behaves. > Also, I've noticed that I have an HTML version of my address book in > a folder, but it is not up-to-date. I assume that this was created > when I updated to version 4 from 3. The html address book must be the 3.x address book, which is imported by 4.0.x, but which is not later updated in any way by 4.0.x. > Anyone know how I can generate the HTML addressbook file from > my regular address book? Neither 4.0.x nor 4.5 currently export in html; 4.5 might later. David McCusker, 4.5 mail/news client address book backend and db Values have meaning only against the context of a set of relationships.

ldif

From: David McCusker <davidmc@netscape.com> Subject: Re: Bug in addressbook export Date: 06 Oct 1998 00:00:00 GMT Message-ID: <361AB79A.323D0994@netscape.com> Content-Transfer-Encoding: 7bit References: <361AA398.4BFB2A3E@interguru.com> To: Joseph Davidson <jdavidson@interguru.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Joseph Davidson wrote: > I am trying to include postal address, phone numbers, faxes, etc > in my e-mail addressbook conversion service. This is turning out > to be very difficult as the information is stored with different > formats and labels. I expect there's no RFC dictating standards for address book formats. > For example, Eudora has one box for "Address", whereas Communicator > has "Address1", "Address2", "City", "State", and "Zip". There's no reason there should be any correspondence. There is no objectively correct way to break down and organize those elements. > To make it worst in my Commnicator (4.03), the file exported to ldif > has incorrect labels, "Address1" is labeled as " postOfficeBox" > and "Address2" is exported as "streetaddress". The labels used in the ldif are correct; ldif has postOfficeBox and streetaddress attributes understood by LDAP servers, but Address1 and Address2 don't mean anything in particular. There need not be any correspondence between display names and export formats. > Are there plans to correct this? No, it's not a bug. Note you might experience further frustration when you find that postOfficeBox and streetaddress get merged together into streetaddress by 4.5, because the frontends only display a single address field for both. David McCusker, 4.5 mail/news client address book backend and db Values have meaning only against the context of a set of relationships.

backup

From: David McCusker <davidmc@netscape.com> Subject: Re: Address Book Backup in 4.5 Date: 13 Oct 1998 00:00:00 GMT Message-ID: <3623F483.F360E7B7@netscape.com> Content-Transfer-Encoding: 8bit References: <36217F92.667F52FF@hamra.net> To: Gregory Hamra <Greg@hamra.net> Content-Type: text/plain; charset=iso-8859-1 Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news While this is not the right group for tech support questions, I can still make the answer to this question relevant to mozilla by discussing design issues affecting future mozilla features. Gregory Hamra wrote: > I'm using Netscape¨ Communicator 4.5b2 and would love > to know WHICH FILES represent the address book. I would > like to back them up. > > In Messenger 4.0, it used to be simply ONE file, right? (Note that if you export your address books to ldif from a UI menu, then you get to choose the file names that will be used. Later you can restore from ldif by importing in the event of loss.) There is one file per separate address book appearing in the UI. In 4.5 final, at most 15 of them will be used for name completion, even though you can have more than this for data organization purposes. (In 4.5b2, you don't want to have more than 10 address books.) The definition of which files must be backed up is given by a certain set of prefs in your prefs file (which can differ between platforms, and is called prefs.js on Windows). Inspecting your prefs for address book related items should make clear which file names are involved. If you clobber your prefs file, then all your address books will be lost to the ken of Messenger, though you could edit them back in again or simply import the binary address book files from where they sit. There are complex issues involved in trying to automatically discover address books, especially with multiple formats that must be imported. Unless you also export to ldif for backup, there is only one copy of your address book information that is kept in an address book db, and this information cannot be recovered if you somehow lose, clobber, or otherwise corrupt an ab file. (This differs from the mail .snm files which can be reconstructed from the text Berkeley mail "folders".) I think future address book designs should use more redundancy and logging of metainfo to keep track of address book status. For example, it would be useful to keep an analog to mail folders for capturing address book cards in text, as events describing address changes. It would also be a good idea to mark all address books with metainfo annotations that show whether they are currently displayed in the UI, when and whether they were ever imported into other address books, and which filenames were ever used to designate an address book. Such distributed metainfo can be used to infer important relationships. David McCusker, 4.5 mail/news client address book backend and db Values have meaning only against the context of a set of relationships.

identity

From: David McCusker <davidmc@netscape.com> Subject: Re: 4.5 address book doesn't like multiple entries with same addresses, last names? Date: 28 Oct 1998 00:00:00 GMT Message-ID: <363793FD.528F0988@netscape.com> Content-Transfer-Encoding: 7bit References: <363772B9.5796428E@compaq.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.communicator,netscape.devs-client-technical Bob Fleischer wrote: > I just upgraded from Navigator 4.06 to 4.6 and I lost some entries > in my address book. We had trouble settling on a definition of identity for person card objects that would satisfy all situations. In 4.5, the definition of identity changed so that email addresses must be unique, so that more than one name for the same email is not possible. It's possible that we should have emphasized more compatibility with 4.0 address books. > It appears that the 4.5 address book doesn't like multiple entries > with same addresses, last names. Actually, it tolerates a two or > three, but decides that additional ones are duplicates and deletes > some. The 4.5 code is trying to make all cards in one address book have unique email addresses. You could work around this using the perhaps awkward expedient of making four address books and using different names for each email in different books. Duplicates are supposed to be possible only for emails more than 31 chars, equal up to 31 chars. If one imports more than one card with the same email in the same AB, then the last should win. Each time the same email is encountered, the old existing card is updated to match the input fields, since it is considered the user intention to refresh the address book content with the imported content, so that import can be used for syncing. At one point, we always created duplicates of everything, under the theory there should be no limits at all. But then in practice, folks found it very annoying that new content was not considered updates to existing content whenever appropriate. So we had to pick a rule for when to update cards due to matching identity, and we picked email. > I have four members of my family that share one email address, > and we all have the same last name. We do, however, have different > first names and nicknames! I can't define all four at once -- the > attempt to store the fourth is detected as a replacement for one of > the existing three. I find it surprising you are able to get more than one with the same email, unless the email address happens to be longer than 31 chars, at which points duplicates might or might not occur. > Known problem? Yes, this is the current intentional design, however flawed; so it can only be changed by a format change to the address book format. There are small semantic differences between 4.0 and 4.5, but this particular one was not really anticipated. Fortunately it should only affect the users who upgrade from 4.0 to 4.5 who had duplicate email addresses in one address book; note I don't mean to trivialize your problem. David McCusker, 4.5 mail/news client address book backend and db Values have meaning only against the context of a set of relationships.

dups

From: David McCusker <davidmc@netscape.com> Subject: Re: Several names, one address - no longer works in Communicator 4.5 address book Date: 11 Dec 1998 00:00:00 GMT Message-ID: <36718034.CB077834@netscape.com> Content-Transfer-Encoding: 7bit References: <3671696c.7263984@news.virgin.net> To: Patrick Fox <patrick.fox@virgin.net> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Patrick Fox wrote: > Several people I write to share one e-mail account. Using > Communicator 4.04 they all had separate entries in my address > book, hence could be addressed separately as, eg. > > Person One <person@mail.com> > Person Two <person@mail.com> > Person Three <person@mail.com> If you make them differ by changing some letter(s) to uppercase, then you will probably be able to add duplicates. Try it and see. > This worked very well and provided an easy way of distinguishing who > the mail was to via the "To" field. However, now I've upgraded to 4.5, > I can't seem to store more than one name per e-mail address in the > address book. Having entered Person One, when I make an entry for > Person Two it says "Entry already exists, do you want to replace it?" > and refuses to add it. This behavior is intentional, though it might seem undesirable to some. Early betas of 4.5 accepted all duplicates of any kind, and this drove folks mad since they expected some kind of winnowing aggregation. For example, folks expected something reasonable to happen when importing the same address book over and over again for poor man's ab synching. When I started enforcing the constraint of only one copy of a card in the same address book, I asked other folks to define what constituted identity of a person in an address book. The answer was email address alone, and that's why 4.5 now acts this way. In retrospect, we might have picked something closer to the prior 4.0x behavior. > Any way round it? Yes, there is, although it is unintentional. Emails were supposed to be case insensitive, but it appears they are not with respect to checks for duplicates. So you can vary the case to add dup email addresses. I don't know whether this works cross platform. It was recently shown to me, and I only know it works on my Mac. I don't know about Windows. David McCusker, speaking only for myself, former 4.5 addressing db guy Values have meaning only against the context of a set of relationships.

text

From: David McCusker <davidmc@netscape.com> Subject: Re: Address book Date: 16 Feb 1999 00:00:00 GMT Message-ID: <36CA585F.CBDE754D@netscape.com> Content-Transfer-Encoding: 7bit References: <36CA41BF.B0687384@nac.net> To: lab@bounceit.net Content-Type: text/plain; charset=us-ascii Organization: Another Netscape Collabra Server User Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news lab@bounceit.net wrote: > Sorry if this has been covered. Repetition's okay, and more or less unavoidable. Plus each time is a bit different, so the context increases each time. > Is the 5.0 address book going to be plain text > (which I think it was in 3.x)? Yes to both. 3.x was html in a particular style and tag schema. 5.0 will at first by the Mork plain text format describe in older postings in this newsgroup (I haven't made a web page yet). But in both theory and practice, the 5.0 address book will at times not be plain text, since the db in use can be and will be replaced with alternatives that also conform to the abstract MDB interface. I plan to do that myself on my own time, and other folks can do so as well. Note that although the Mork format is plain text, it is not very user friendly (human readable) when it is highly compressed by sharing common strings that get assigned hex IDs before used as card attributes. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

integration

From: David McCusker <davidmc@netscape.com> Subject: Re: AB integration (was Re: PGP in Mozilla) Date: 19 Apr 1999 00:00:00 GMT Message-ID: <371B90CF.52328AE5@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Eric Hildum wrote: > [ snip David McCusker wrote: [AB integration for keys & addresses]] > > I kind of like the way Outlook seems to handle this (note this is a > guess, I have not actually been able to use Outlook encryption), where > it picks up from the key the default information contained in the key, > but allows an arbitrary key to be associated with the address > regardless of whether or not the contained email address matches. (I get cranky when you mention Outlook when describing a feature that doesn't need such mention to help clarify the description.) We can put anything in AB's based on MDB, because it encodes anything, including one-to-many or many-to-one mappings if you need. Under MDB, a "thing" composed of any number of arbitrary cell attributes is called a "row". A row can represent an address entry, or a pgp key. Also under MDB, a collection of such rows is called a "table", and one can build complex graphs since cells in a row can reference another row or a table by ID. Neither rows nor tables have any schema, which means you can add free form columns to rows, and add any row to any table. (Note the terms row and table do not imply anything else appearing in literature for databases, despite the fact that MDB rows and tables do in fact roughly correspond to traditional db notions with those names. The MDB notion of table also corresponds to a sparse matrix, and the term "matrix" is what I happen to use in public domain IronDoc.) If you think about it, this organization is topologically isomorphic to building in-memory data structures using structs and pointers, so there's an easy proof that any nice and desirable data structure you can build in memory can have an isomorphic counterpart in a MDB file. And although Mork is linear text, the MDB API does not imply linearity. The main idea is that reference by ID permits indirection semantics that makes it possible to share an object among many others that point to it. So we don't need to tightly couple address entry rows and key rows, when we can instead use indirection to build any 1-to-N or N-to-1 maps needed. So the mechanism for encoding is easy. The hard part is specifying a standard policy used by plugins for using the storage mechanism. Do we expose a limited and specific interface, or give a plugin broader use of the DB? And if we allow broader use, then who specifies standards used in that context so the AB content is open and not proprietary? > I think this is a nice workaround the PK12 (?) limit of one fixed > address per key. Right, there's no need to assume any fixed 1-to-1 mapping with keys. > Most of this should probably be handled by the plugin, of course, Sounds like you're saying a plugin should keep it's own database of associations between addresses and keys, instead of putting any open representation in the address book. An open technique seems better if it encodes user information that should be accessible to other tools. > but this does mean that the design should allow multiple keys to be > stored, with one marked as primary, but all potentially useable > for decryption/signature verification. If folks describe all the varieties of data they want to associate with keys and addresses, along with the desired mapping relationships, I can sketch out plausible ways to encode those in example Mork form. (Since I've never followed the security computing genre in depth, you need to be terribly literal in describing the data you want, since I likely won't infer enough from hints or standard practice references.) David McCusker, making a little difference (iota inside(tm)) Values have meaning only against the context of a set of relationships.

radical

From: David McCusker <davidmc@netscape.com> Subject: Mail team comm (was Re: Open Source AOL...Moz IM) Date: 01 Dec 1998 00:00:00 GMT Message-ID: <3664A1D5.F4A8C56@netscape.com> Content-Transfer-Encoding: 7bit To: Mike Shaver <shaver@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.general You are right -- so far on the mail team we have only been writing docs for internal consumption, but I'd like to change that some soon. For example, I understand that I am supposed to post a first draft of an API for db abstraction in a few days on mozilla.org, and after that point the whole thing is open to discussion and change requests, etc. I am extremely receptive to the idea of discussing things in the open, so if I exercise any sense of humor below, it's only because I find my limits amusing and not because I mean any harm or disrespect to anyone. Mike Shaver wrote: > Netscape User wrote: [ big snip ] > > Maybe the people at Netscape have good telepathy, or else they're > > coordinating these projects somehow, and some of the `team' is not > > invited to these discussions. > > See, the problem here is a catch-22. Who on the mail team isn't being > invited to these discussions? Nobody, because there aren't any outside > developers on the mail team. Why aren't there any outside developers on > the mail team? Because (say you and and I, and we're _obviously_ right, > why can't they see?) the discussions aren't out in the open. Okay, two hours ago I was in a scheduling meeting, where we were trying to determine what basis we might have for estimating how long it might take to do some things under some radical assumptions. At the end of the meeting I suggested that we report the following to management: "We'll rewrite everything and have no idea how long it will take." See, isn't that funny? ;-) Yes I was half joking, but only half joking. And this brings up a basic conflict that can be expressed as follows. If we commit to time estimates, and then open everything to discussion, what kind of effect is this going to have on our schedules? If we talk this over with lots of outside developers, and then miss our personal schedule commitments, who is going to get burned more? It might seem a lot easier to discuss designs in an open-ended fashion if we were given get-out-of-jail-free cards from management on the topic of meeting schedules. I don't have a good idea how to deal with that. However, lately I've been going around to talk to Guha, and Waterson, and Hickman and other folks (though I've only found Waterson lately), and I expect to spend a lot of time talking to these folks. So maybe we can get them to summarize the issues, or to at least participate a little if we start talking about mail here and elsewhere. In my case at least, just tell me what you want to discuss and I'll tell you what I know that I think I'm allowed to discuss, and if there's any- thing I expect I can't discuss, I'll say so explicitly and give reasons. There are definitely things I cannot say without getting shot since I've already managed to get sucked into problems in these newsgroups before. > Mail folk[*] don't buy that, or seem to think it doesn't matter. It > bugs me a lot, and it apparently bugs you a lot, but what can be done? I think it matters, and suppose some others here do as well, but perhaps we are being paralyzed by the fact that we are rationally contemplating large scale rewrites of a variety of things. Which things? That's what we are talking about, and it involves time estimates. It's hard to talk about design when we don't know how much we are designing yet, exactly. Note that a lot of our discussion is aimed at getting on exactly the right train for 5.0 mozilla. This piece of work is going to be large. Is that the part you folks wanted to discuss? Or did you want to talk about what happens afterward, once we are in mozilla? We are not ready to talk about what happens afterward, since it's too far out. > [*] I should get out a smaller brush here. Not all mail folk believe > this, I don't think. But some do, and they seem to be in > management/leadership positions. I'm really just using mail as an > example here, partly because they're one of the last big components > to not come out into the open. [ big snip ] We started out really, really conservative. But just now we are feeling quite a bit more radical. Not as radical as retiring to go raise cats in Montana, but radical compared to before. What part would you like to discuss? I'll take your requests and ask where my boundaries lie to make sure I don't get my head handed to me for gushing with too much energy. David McCusker, speaking for myself only, secretly designing mail stuff Values have meaning only against the context of a set of relationships.

dates

From: David McCusker <davidmc@netscape.com> Subject: No dates, no promised features Date: 02 Dec 1998 00:00:00 GMT Message-ID: <3665D9A8.49C2D791@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news I just had a short conversation with Phil Peterson and Scott Putterman to discuss what we can talk about in the newsgroups. The tone seemed to lean towards "talk about everything", but we tried to isolate the exceptions about which we should not have discussion. No dates - we are not going to describe any schedules, though we will blithely admit that schedules exist. Please don't ask when something might be done, even in approximate terms since that's just hedging. There must never be any basis for publicly judging whether we are late. No promised features - we are not going to promise any particular kind of feature will actually appear when we ship. But we can talk about any feature, and discuss features we intend to ship, but this does not mean we promise to ship features. Please do not ask for promises. So far those are the only apparent proscriptions. So the next thing I'm going to do is start another message to post the ascii sketch of an architecture layer cake for db abstraction, since doing so was specifically cited as something okay. (Not that it's exciting.) David McCusker, speaking for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

mh

From: David McCusker <davidmc@netscape.com> Subject: MH is out (was Re: mail/news db 5.0 offline design) Date: 22 Dec 1998 00:00:00 GMT Message-ID: <367FFC0A.3BF2EB11@netscape.com> Content-Transfer-Encoding: 7bit To: waste@greendragon.com Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news (I had wondered about the origin of the odd name for that text edit library on the Mac, and your email address seems very suggestive.) Most of my comments are about MH to assure folks the issue is dead, since I never seriously considered the notion, because the overhead is totally absurd and out of the question from a db perspective. I develop most on a Mac, and MH would kill my system, so end of story. William Allen Simpson (waste@greendragon.com) wrote: [ big snip ] > MH-style message per file: > > If it is a big win for IMAP and news, we could consider MH files. > But it would have to be a very big win, because: I don't even see a small win from using MH style dbs for IMAP and news. However, most folks need to organize their data in some fashion, and dbs are generally formalized approaches, and many folks use file systems as the formalization of choice as a poor man's db. But this usually does not have very good performance for a variety of reasons, when compared to another choice fitting the application somewhat better. > 1) message per file is horribly space inefficient. Even with larger > drives these days, this is going to hurt those of us handling > tens and hundreds of thousands of messages. I doubt that I could > carry a single year's worth of messages on a laptop. Yes, this objection kills MH out of hand, and I'm sorry I didn't point this out to folks instantly because I was trying to accomodate views from other folks. I didn't want to be rude by posturing as db expert and laying down the law to folks who haven't worried about performance. > 2) message per file scales poorly. File open/close is slow on > every system that I'm familiar with. There is the well-known > inode problem on Unix, the extent problem in DOS/Windows, and > any more than 100 files in a Mac directory kills performance > (although I've heard that problem has been alleviated in 8.5). So now you've revealed several of the motivations driving my vaporware public domain IronDoc structured storage system. Since it's on hiatus, I won't bother posting a link to my web site. Consider it an abstract file system in a file, supporting arbitrary collections of schema-less btree dictionaries and schema-less blob trees, with type annotations. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

compatible

From: David McCusker <davidmc@netscape.com> Subject: Re: [MDB] forward and backward compatibility Date: 20 Jan 1999 00:00:00 GMT Message-ID: <36A68C20.C120FBD2@netscape.com> Content-Transfer-Encoding: 7bit References: <36A5069C.DB0B9E51@netscape.com> <36A668C9.F6D77DB@netscape.com> To: Phil Peterson <phil@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Phil Peterson wrote: [ snip ] > Oh, dear. I'm really sorry about summary.dat. I meant no criticism, no apology necessary. (It's not too bad in my offhand personal opinion -- simple is good.) > It's a text file which is stored in the root of your profile directory, > with sprintf'd values from the metadata in each summary file. This > includes things like the count of unread and total messages, the > character set, the offline flags, the name (for IMAP), etc. On my Mac, the file's called "Folder Cache 2". (I see a path to the db file followed by tab-delimited numbers for those attributes on a line.) In Mork, we could make a table containing a row for every db file, and have every row hold attributes describing that db, and it would look a lot like summary.dat except with explicit column annotations. > The purpose of summary.dat is to allow Messenger to draw the folder pane > without opening the summary file for every folder which is shown in the > folder pane. In the 4.x world, opening all of those databases was > enormously expensive. That makes sense, and it's likely to be too expensive for many db's (even ones I like) when they have much content and have a lot of random access support with tables of contents that involve several to many disk seeks in order to be minimally operational. DB's with internal directories using btrees tend to seek around a bit to access the first real content. (Disk seek time is typically on the rough order of 60 seeks per second, so as soon as one does a dozen or two random access reads, one has hit significant time overhead when this is then repeated very many times.) > Summary.dat is read at boot-time in order to draw the folder pane, and > then deleted. It's then written out again on a successful exit. So the > idea is to avoid trusting a summary.dat which contains stale data > because we crashed. This is why it takes so long for Messenger to boot > after you've crashed. For DavidB I added MDB API support for quick extraction of db metainfo. This could be rather fast for Mork in good cases, but is likely to be slower for "better" random access db's with less internal contiguity. This usage was expected to reconstruct the summary after a crash. Would it be possible to base initial display on a summary without really trusting it for sensitive numbers affecting stability? If so, then it would not be a big crisis for the summary file to start stale, if it tends to become corrected during the course of more runtime usage. Using Mork and appending incremental updates to the summary, it would be very near correct even after a crash. So maybe we should not delete the summary after reading it. Then we would never have a slow startup after a crash, at the risk of small variances in starting display. How will that sound to most users? (I know my own opinion can be atypical.) > The versioning scheme is that each subclass of folderInfo can write a > version number into its field, along with the data it uses. New stuff is > supposed to go onto the end, so it can be ignored, like with a vtable. > It's a hacky attempt at schema flexibility. Did I mention I was really > sorry about it? Actually that sounds decent, given a real effort to support versioning. Mork style versioning might only be better from less tendency to delete content not understood from later versions, but perhaps that would be stale anyway if other attributes are updated by themselves. And explicit column annotations allow versions to branch in a tree and not just a straight line, though that's overkill for this usage. > So in 5.0, the first question is whether it's still really expensive to > open all the DBs. Assuming that it is, we could use MDB/MORK to build a > new summary.dat with better schema flexibility and no silly printf/scanf > stuff. Presumably, MDB/MORK would be more forgiving of schema upgrades > than summary.dat was. If and when MorkDb does the smartest thing with the limited usage hints, it might be fast enough to open all the dbs, since it would not do many reads to pull the summary metainfo. And if we show the UI early and progressively update it with ongoing info found in the db's, it might seem fast enough for many users. But summaries still seem good there. And better db's with good random access reading behavior will tend to do more reads to access a specific set of data than a db like Mork that can more easily store special data continguously near a known place (like the beginning of the file). A random access db has good performance to read small content compared to reading all the db content, but can have poor performance compared to simpler db formats tuned for specific use. (For example, I'd expect public domain IronDoc to need summary files.) It would still be feasible to use printf/scanf to write a Mork file entirely from beginning to end, but including the row format columns per each attribute instead of just separating tabs, plus a little bit of wrapping table metainfo at the head and tail of the file. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

sources

From: davidmc@netscape.com (David McCusker) Subject: Re: Data source questions Date: 26 Jan 1999 00:00:00 GMT Message-ID: <36AD0ED4.1D3993C8@netscape.com> Content-Transfer-Encoding: 7bit To: Chris Waterson <waterson@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf Chris Waterson wrote: [ snip ] > Scott Putterman wrote: [ snip ] > > This leads me to another question. I was noticing while debugging > > a datasource that my datasource's GetTarget was being called for > > every GetTarget request including requests on URI's that my > > datasource doesn't handle. Is there some way to make it so that > > datasources can register for the URI's they are interested in so > > they don't get called for every request? > > Not currently. To give you an off-the-cuff answer, it's not clear to > me that doing said registration will be that much more efficient in > time or space as just having each individual data source ignore the > request. I could be wrong... Note I'm not familiar with the specific interface being discussed, so I'll address this as if the general discussion is about notifications, under the assumption the performance effects are somewhat similar. The question is whether one should or shouldn't call a method on some object that might be able to respond, where one might pessimistically call all such objects just to make sure no potential response is missed. In this case we're talking only about data sources? So we only worry about performance if the number of data sources gets very large? How many are we talking about? What's the upper bound? What's a typical worst case assuming someone takes the liberty of architecting with more data sources to achieve an effect than one might first suppose? I would like to see the interface arrange for some early pruning if it can be done easily, and if there might be many data sources. That's why folks want registration -- to prune the candidates for a method call to those pre-declaring a predisposition to have a suitable response. I'll compare this situation to broadcasting notifications to dependents a Smalltalk style computing context, to illustrate how representation for such a thing might affect space and time costs. Under Smalltalk, any object might have a list of dependents, where each gets a notification when the base object does a change notification. The way one wants to represent this list of dependents will depend on whether one expects objects to only sparsely require such a list. If every object tended to have dependents, one would want an attribute on every object to list them so access is most efficient. But if the vast majority of objects have no dependents, one would prefer to hold them stored someplace else, in a sparse side table. Using either of these representations involves an explicit step to register or remove a given dependent, to affect whether it's a candidate for notifications. Alternatively, one might send the change notifications for every object to every other object in the system, just in case they might be a dependent, which they can determine for themselves. But that would tend to have really terrible performance, and that's why one of the first two registration representations described makes more sense. Is this analysis applicable to GetTarget requests for data sources? David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

undocumented

From: David McCusker <davidmc@netscape.com> Subject: Re: netscape mail Date: 08 Sep 1998 00:00:00 GMT Message-ID: <35F59574.C1E3B472@netscape.com> Content-Transfer-Encoding: 7bit To: "James M. Cape" <jcape@jcinteractive.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news James M. Cape <jcape@jcinteractive.com> wrote: > Indeera Munasinghe wrote: > > Does anyone know the internal file formats of netscape mail > > folders and addressbook > > The Address Book in 4.0 and the .snm files are in Berkley DB > format, and the mail folders are in Pine format. Berkeley DB is used nowhere in either 4.0 or 4.5 mail/news files. But the mail folders are Berkeley mail folders, which is a text format with no relationship to Berkeley DB binary btree formats. The address books and .snm files in 4.0 and 4.5 are both undocumented binary database formats written by third party software that has no relationship at all to Berkeley DB. The only software that can read and write these files is the code in Messenger, because the sources have been significantly changed from their original distribution form. David McCusker, 4.5 mail/news client address book backend and db Values have meaning only against the context of a set of relationships.

snm

From: David McCusker <davidmc@netscape.com> Subject: Re: Message format for .SNM files Date: 28 Jan 1999 00:00:00 GMT Message-ID: <36B0C510.32628FB2@netscape.com> Content-Transfer-Encoding: 7bit References: <78ovt0$9lh2@secnews.netscape.com> To: Jonathan Zufi <zuf@planet.net.au> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Jonathan Zufi wrote: > Would anyone know if the file format for the .SNM > index files is available? For 4.x and 4.5, neither .snm format specification nor source code will ever be made available. The source code is a one-of-a-kind hybrid that cannot be released. The format is also effectively a hybrid, and is not actually documented anywhere, other than implicitly by the source code. (This is the most I care to say about that particular .snm format.) David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

format

From: David McCusker <davidmc@netscape.com> Subject: Re: snm files format Date: 19 Oct 1998 00:00:00 GMT Message-ID: <362BB58F.34196051@netscape.com> Content-Transfer-Encoding: 7bit References: <3618EB86.29025165@anmv.cneva.fr> <36286722.DC5E350F@cwia.com> To: Jim Davis <jimdavis@cwia.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Jim Davis wrote: > > Henri Heuze' wrote: > > I'm looking for the format of .snm files > > My understanding is that Netscape used a proprietary > database engine/API to read/write .snm files . Yes, an old version of a third party database engine that has been tweaked and subclassed enough so that the original engine alone could not read the .snm files, even if a developer had the original engine. Only this hybrid code system can read and write the .snm files, and the hybrid cannot be distributed. (I won't attempt any explanation.) > And that .snm files are thus undocumented and obsolete . They are undocumented, but not exactly obsolete since 4.5 uses them. But from the perspective of mozilla, that might not mean much. > Perhaps someone could post a description of .snm fields > and the nature of the message linking ? While that is technically possible, it would be very hard. It's about par for the course for databases to have very involved formats. And not only would it be hard, it might also require describing some third party structures in such minute detail that this might seem to reveal confidential information, so that seems out. (Note that while I understand these data structures exist, I am not familiar with them beyond mere exposure to their names. And I don't want to know anything about them since I don't need any more exposure to confidential information than that, since it can be a burden. This is not humor; I'm not even smiling a little as I write this.) > Or describe the snm API ? It would be feasible to describe an abstract snm API after developing and polishing one, that would not be the actual snm API in use now. Any interfaces that derive closely from third party engine interfaces might contain proprietary information. But the existing APIs could be described solely in terms of the mail/news semantics without any reference to engine specific semantics. But if we had such an abstract interface in hand that we were ready to publish, one would still be unable to access the snm files with them since there would be no connection between interface and code. And the interfaces would be of no help in deciphering snm files. Is the API desired for 5.0 design purposes, or to access snm files? David McCusker, 4.5 mail/news client address book backend and db Values have meaning only against the context of a set of relationships.

archiving

From: David McCusker <davidmc@netscape.com> Subject: Re: Archiving Old Emails? Date: 31 Jan 1999 00:00:00 GMT Message-ID: <36B527E5.1FA33BB5@netscape.com> Content-Transfer-Encoding: 7bit To: Greg Anderson <greg.anderson@hurricane-hhl.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Greg Anderson wrote: > Is there an way within Messenger to archive email messages and > folders (I've checked the FAQ and UFAQ and am thoroughly frustrated)? > As well, with any uitility that does archiving, how easy is it to > retrieve messages that have been archived? Please post no usage questions to netscape.public.mozilla.mail-news; It's a developer group for Mozilla 5.0, so you now owe us constructive input on designs for 5.0. :-) I can see how you might refine your request as a proposed feature, so that's what I'll ask about here. Please clarify what you want to happen. As near as I can tell, you wish mail would not occupy so much space, because you cited mail using space around 100MB for some set of users. Do you mean archived on the client, or on some server, or some of both depending on whatever works? I guess you want mail out of inboxes, but still reachable somehow under some search criteria not described. Do you want indexing by content? And are you willing to consume more space for some ambiguous indexing? Is the goal of archiving to save space? Or to get mail out of sight? I can write some archiving tools later on my own time, since this is something suited to the kinds of projects that often interest me. But I don't have a good idea what kind of utility you want right now. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

import

From: David McCusker <davidmc@netscape.com> Subject: Re: Import Address Book; Expand Message Threads Date: 16 Feb 1999 00:00:00 GMT Message-ID: <36C9F63C.DE89820E@netscape.com> Content-Transfer-Encoding: 7bit To: Martin Cleaver <mcleaver@altavista.net> Content-Type: text/plain; charset=us-ascii Organization: Another Netscape Collabra Server User Mime-Version: 1.0 Reply-To: davidmc@netscape.com Martin Cleaver wrote: > > I'm slightly cranky today. [ snipped explanation that usage Q's > > are off-topic in netscape.public.mozilla.mail-news, then summarily > > ignored by Mike Cleaver for apparently personal reasons. ] > > That's pretty weird. Not as weird as I hope to be in this posting. I have a wry expression on my face, and you can play twenty questions to guess what it means. Is my smile bigger than a bread box? Animal, vegetable, or amphigory? > I am writing this having eventually managed to get into the Netscape > newsgroup hierarchy (not available through my local ISP). I was pretty > confused to see 35000 headers downloaded before [ snip, snip, snip ] > Doesn't sound like the choice of newsgroups does much > to encourage people to seek out a suitable area :-( I understand this stance very clearly. The door wasn't locked, and no sign said to keep out, so you can walk in any office you please and shout for directions in complete innocence. It's such a bother to note a group's existing tone and direction, and so much easier to take group names as signposts on the interstate highway (food and gas ahead! :-). > In general I am finding it pretty hard to get any help trying to find > out how to convert from MS IE/OE to Netscape (which I would like to do, > as part of a general move to Linux). Are you persuading others not to try? Or is it merely the case that Netscape is obligated to train a cadre of catchers in the rye to keep poor souls from falling over the Microsoft cliff? (I am finding this newsgroup-as-encounter-session approach so very refreshing. :-) > For instance, I'd love to import my address book from Outlook 98, but > Netscape can't find it and when I skip, it just imports my Outlook > Express address book (for the umpteenth time with only e-mail addresses). Technically, Netscape doesn't look for your address book. Either I look for it (or the code I wrote) or Tony Robinson looks for it (or the code he wrote). I didn't write any code to seek Outlook 98. I don't know whether Tony did. I find your idea charming that interoperability with Outlook is a major feature list bullet item. > I'd also like to convert my Outlook Calendar and Tasks, but see no > way of doing that. BTW: are the Netscape online Calendar and off-line > Calendar the same thing? I dread to think what will happen when I want > to import thousands of mail messages... Okay, this seems like the usual program. Nice demand for MS features for calendars. Nice FUD on mail import. Good job. If your OE mail is in Berkeley mbox format, just put them in directories where Netscape looks for your other mail. They don't need to be imported. Naturally nothing whatsoever will happen to damage your existing mail. > I'd just love to order the messages in this newsgroup (with its 8500 > messages) in chronological order, but the last 500 seem to be without > a date and only with a time (or a day of the week)... That's exactly as intended: minimal date format to disinguish now from the date on a message. The closer in time they are, the less detail is shown in the difference. Messages sent today only differ in time. Did you notice that part? Or are are you trying to spook the newbies? > this is quite disconcerting to a newbie, who begins to wonder just how > buggy this product You are being very naughty to impute bugginess without citing a bug. > must be to provide such addresses... Can you offer some reassurance to > someone who thought that Netscape might just be able to consiberably > outperform Microsoft, but is now beginning to worry.... :-( What needs reassurance? That your message has time "1:07 PM" in my reader, which lets me assume the date is today (16Feb99)? I don't get how that is frightening. When you use the word "addresses" above, what do you mean by that since your sort-by-date involves no addresses? (If you get hysterical, I can't slap you when you're so far out of reach.) > To recapitulate: Where can I find out how to import addresses and > calendar info from Outlook and messages from OE? To recapitulate: netscape.public.mozilla.mail-news is the wrong place for such questions, unless you like sharp-tongued engineers to write responses with that edgy dramatic quality we find so amusing. However, I will briefly try to be constructive again. Try to follow my reasoning here. I wrote 4.5 address import code at Netscape, but I didn't import OE addresses. So there is no engineer at Netscape that can answer that question directly, unless they know about import code written by someone outside Netscape (like Tony Robinson). The workings of Tony Robinson's code, and the specs for how and why it does anything, are a complete mystery to me. I doubt anyone else has a much better grasp, unless they've found some docs somewhere. It's possible Tony's code imports OE addresses, but I don't care. (I know you think I should care, but that's a weak propaganda straw.) You seem to think that Netscape must have set up a task force to make sure users can push a button to leave Microsoft products and convert to Netscape products with a magical flourish of lights and trumpets. Why do you think that? Did you see that in specs someplace? Or is lack of conversion from MS to 4.5 the best criticism you have for 4.5? Now, I know you can complain that I haven't answered your questions about importing calendar and messages directly, but I have in fact said something useful indirectly, by explaining we did not assemble special engineering resources specifically for snatching Microsoft users. If you want that level of paternalism from a software company, I grasp why that appeals to you emotionally. But it's a misplaced expectation. > And why are there 8500 messages in this area? That would seem consistent with different folks having posted as many as 8500 messages in a group are reading, right? David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

annotate

From: David McCusker <davidmc@netscape.com> Subject: Re: Messaging Annotation Date: 08 Sep 1999 00:00:00 GMT Message-ID: <37D6E148.FEFC3792@netscape.com> Content-Transfer-Encoding: 7bit References: <37D68914.C587E09C@uumail.de> <37D6D2DB.693CC260@uumail.de> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Ben Bucksch wrote: > + ------- Additional Comments From davidmc@netscape.com 09/07/99 14:08 > +>What do you mean by "schema-less database" any how would it looklike? > +That means the number of attributes for any object is unbounded, so you > +can always add more. > > This is, what we have with folders of RFC822-messages (or annotations), > right? You are saying that headers in RFC822-messages show essentially the kind of attribute value pairs one wants for schema-less annotation. Yes, that's right. If you used headers of similar style for your annotation format, then you'd have an acceptably schema-less system. However, if you wanted to assert that the headers you use in your annotation system have some well-defined relation to RFC822-messages, then you will no longer have a schema-less system, because you will have then adopted some RFC822 schema which will impose restrictions. (I know, folks actively crave restrictions, and that's way the search for the one-true-way is such a strong influence on folks' behavior.) > +Personally I care a lot about formats, but I never seem to meet anyone > +who actually understands any format designs I create, to the extent > +they see why I made certain choices. So I view emphasis on format with > +some suspicion, since I expect someone to then settle for a fixed > +system that holds no interest for me. If you pick a non-extensible > +format, then I would tend to ignore everything after that point. > > Do you think, that's happening here? Yes, if you adopt a schema. If you use RFC822 as the basis of defining what your annotations mean, then you have a schema (though not an very constraining one admittedly). > +>Extensible designs are what I normally vote for, too. If anybody > +>could give ideas how to do create this, I'd like to hear that. Real > +>extensible design are hard to create, since you have to make things > +>possible, which you can't think of. > +You might check out my months stale IronDoc site, which has much > + morematerial on database stuff than you'd probably like to read: > +http://www.best.com/~mccusker/irondoc/irondoc.htm > +http://www.best.com/~mccusker/irondoc/query/whatis/qwfe.htm > > I looked to that pages, but they're *very* huge. Yes, I know it's too much to read, even though it's very incomplete. Isn't that an interesting conundrum? > +One main theme in my IronDoc work was extensibility. So I have a > +notion I know a lot about this, but I lack the verve to play an > +authoritarian on the topic. However, the main techniques involved > +are using attribute value pairs, and designating almost everything > +using names so that the set of denotable things is both very large > +and flexible. > > This is the same idea like RFC822 (see above), right? Yes. The main problems with RFC822 are three: it's linear text, it's verbose without string atomization, and it has some standards which have the effect of imposing schemas. But it is extensible, yes. > +I keep the position that actually storing annotations is a much less > +delicate problem than how to control the usage of these with respect > +to the remote content that is being annotated. > > What problems do you see (except those mentioned in the next paragraph)? Wearing my pessimist's hat, I expect all usages to cause problems which have the effect of writing either original message stores or the annotation stores, or interfering with bindings in the file system by doing such things as moving or copying files. A design that doesn't describe what to do about entropy is not complete. An optimist will say, eventually the problems will shake out and the system will stablize, because we live in the best of all possible worlds and it is the basic nature of things that chaos dampens out as we converge on peace and harmony. Wrong: entropy happens. I predict somebody will someday say to me, "Dave, the annotations keep getting out of sync; can you fix that? Here's the bug number." I will say, no, I refuse to fix that bug and assign it elsewhere. Most of the unpleasant work in my professional life has been cleaning up things that were not sufficiently well defined in the first place. > +Annotations should be stored separately. Each message has an identity, > +which distinguishes it from another. Each annotation would also say > +the identity of the message which it annotates, so one can look up > +whether any given message has some annotation. > > Yes, the same is true for replies. This is, why I vote for making > annotations special replies. I'd feel better if you said they were just "like replies", instead of actually being replies, since that implies some schema constraints. > +>The real problems in implementing this are (1) understanding > +>the current code and (2) the danger of forgetting something. > +There is also a danger of not having a definite plan, since some things > +cannot work well by following the algorithm of removing annoyances as > +they arise. Sometimes the process does not converge on a done state > +when one plans to fix conflicts as they come up in practice, instead > +of defining how conflicts cannot happen in the first place. > > Yes, exactly my approach. But the problem is, there's no "plan" > (documentation) apart from the code. The codebase is very large and I > don't know if I miss something. The only other way I know would be to > ask someone who know the code for help. I am familiar with the typical absence of plans. I just make them up. (Note strong connection with "ontology mechanics guild".) You have to impose structure of plans on the chaos through sheer force of will. :-) It's like when you go hiking and try to leave the trail cleaner than when you got there. When you hike through the code, you want to leave more plans than were present before. With regard to the annotation problem, here is what I don't like. Yes, one can specify a system such that it is possible to get bits where they belong, and then interpet them correctly to get desired semantics. But having it be possible to get bits right is not enough; one has to also plan how they will stay that way despite shaking and shifting. When I see a plan that does not address forces of shaking and shifting (yes I'm being metaphorical), then I know the system won't work in practice because the system will go down regularly as components fall out of alignment or synchronization, or some other delicate balance. > + I might not be able to say much more about this problem, but I might > + kibitz at random. I'm confused about how the implementation can be > + underway when I don't see that the design has been roughed out. But > + maybe I just don't get your normal coding practice. > > I think I have an impression how to implement this. And, as stated in > the tread with Matty, I have a vage idea how the general scope > annotation could be implemented, at least the parts that are related > to message annotation, and don't see any problems. I keep saying that storing annotation bits is easy. For example, one can easily annotate one Mork file with another one. However, that will not cause the two to stay in sync, unless the system which uses both of them gets a stronger plan than the original one with only a single Mork file involved. Saying "just make them stay in sync" is the same as giving all the real work to someone else. David Mc

rewrote

From: David McCusker <davidmc@netscape.com> Subject: Re: High level documentation Date: 06 Oct 1999 00:00:00 GMT Message-ID: <37FBB49F.9E814F27@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Ben Bucksch wrote: [ snip ] > This means you rewrote mailnews completely in the past 2,5 years? > Respect. Not me personally, "I just do eyes", I mean storage. (Sorry, that's just a silly quote from _Blade Runner_.) Only one year, since 4.5; but with being able to think of 4.5 design and implementation, which must be easier than starting from scratch. Or maybe David Bienvenu does everything from scratch, he's that good, and that familiar with this kind of thing. I suspect he did not have to actually copy the 4.5 code, though he likely imitated his earlier architectures. And lots of mailnews folks have been helping too. I had very little reason to look at any 4.5 code for MDB and Mork, although I did bring forward the file and stream classes from 4.5 (which in turn had been based on similar public domain IronDoc stuff). It would not have helped to have access to 4.5 designs either, since they are not any more extensive than 5.0 designs for backend work. I'd say most of the really complex stuff has centered about effects caused by the use of RDF and data sources; but other infrastructure things have changed in big ways (so keeping folks like mscott busy). This has felt more like porting a complex app to a new platform, than like a top-to-bottom redesign/rewrite, though there was some of that. If you throw a cat off a building, they will usually land on their feet, but I have trouble seeing where comes the respect for that. :-) > I remember having seen one of these whiteboards with a printer on > one picture of a Netscape meeting room. I thought, some printouts > during/after design meetings might have survived... I have never once seen such a device used that prints whiteboards. If we had used one though, we'd not have much more than lots of code layer cakes that look similar to each other, so not a lot of content from design meetings was actually drawn or written. David Mc

nsstring

From: David McCusker <davidmc@netscape.com> Subject: Re: why nsString must die... Date: 11 Nov 1998 00:00:00 GMT Message-ID: <364A1154.210ACB07@netscape.com> Content-Transfer-Encoding: 7bit References: <36491D69.EB9CFCE5@netscape.com> To: Rick Gessner <rickg@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.layout,netscape.public.mozilla.general Folks might sometimes view my opinions as anti-C++, but it is more the case that I advocate explicit coding styles over implicit ones, and C++ has many implicit features, like operator overloading, where complex things can occur in innocent looking expressions (or in expressions not even visible such as destructors). I originally learned to detest implicit coding styles when working on the Pink Operating System (which later became the moribund Taligent). I reviewed code, presented as good design, which was composed of long sequences of constructor calls which were relatively undecipherable. You'd think folks wanted to use only constructors and destructors as an intellectual excercise in proving these were Turing complete. :-) Anyway the result was very hard to read, although apparently clever, and I assume this generally leads to unmaintainable and buggy code. The reason I even bring this up is to address my fear that we might use a C++ class that is even more damaging than passing around naked char* strings, since it would cloak lurking chaos in the guise of apparently organized objects that cover semantic issues ambiguously, and ambiguity kills software. (I'd thought we could not do worse than naked strings.) But I don't know about the standard string class in C++ specifically, so I only have a suspicion that it involves both templates and a full suite of operators, and that sounds like a death of a thousand cuts. Rick Gessner wrote: > I like nsString. To be fair, I wrote it, so I'm biased. It's served > ngLayout well these many months (without templates by the way). I'm glad it didn't use templates. I hope killing nsString does not force us to use templates all by itself; that seems like a big change. > And now it's time for nsString to die. Here's why: > > 1. Duplicating code that already exists and is readily available is > plain stupid. Yes, duplicating code without bound is bad. Some duplication is good when it forms a firewall that breaks dependencies, though. Actually it is necessary to have a little bit of redundancy in order to have a tendency to converge to order in the face of small chaotic effects. > It's easy to argue that today's compilers are better than we had > a year ago, and they offer better standard libraries. It was convenient that some compilers had poor support for templates and exceptions, etc, since even with good support these are features which cause implicit system behavior, and might be better avoided. > The standard C++ library includes a fine string class. I suppose we could use it inside a more restrained string class for the actual implementation, in order to avoid exposing semantics that require templates and permit use of overloaded operators. > So continuing to support nsString, therefore, is plain stupid. I don't know enough to intelligently compare nsString and the other. > 2. It has been proven (hundreds of times) that it is impossible to > build a string class that everybody loves. Stick a dozen programmers > in a room and ask them about their needs regarding a string class, > and you'll get two-dozen answers. I for one don't want to be the > poor soul who has to satisfy that many competing constraints. I agree completely and sympathize with anyone caught in that place. > 3. Mozilla should spend it's time developing technology in areas > that directly impact web developers, and not in places where others > are already sufficiently innovative. Yes, but I'd like to see Mozilla invest in avoiding any unnecessary dependencies, since they can pile up until too deep to shovel easily. Unfortunately innovation is often strongly coupled with dependencies on factors with costs that should be examined before embracing. One can too easily embrace things with mutually incompatible costs. > 4. I have better things to do. Yep. If the Standard C++ library string class does more, you could head in other direction with nsString and do even less. That might also result in avoiding a waste of your time. > Your objective, should you accept it, is to tell me why I'm wrong. I think this choice makes system stability diverge instead of converge. This choice actually affects my estimation of future software utility. > If you can't/don't, nsString will be killed within a few days, and > we shall live with the Standard C++ library string class. Without knowing their relative shortcomings, I'd rather have nsString if the interface is smaller and less ambitious. (Ambition is costly.) > Let the games begin. I don't want to be lost in maze of twisty passages, all different. David McCusker, speaking for myself, trying to avoid system entropy Values have meaning only against the context of a set of relationships.

i18n

From: David McCusker <davidmc@netscape.com> Subject: Re: Q: i18n strings in COM APIs? nsString C++ or COM? Date: 04 Jan 1999 00:00:00 GMT Message-ID: <36916A40.8118310B@netscape.com> Content-Transfer-Encoding: 7bit To: Naoki Hotta <nhotta@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Naoki Hotta wrote: > [ Erik wrote: ] > > [ David McCusker wrote: ] > > > I am cc'ing Naoki Hotta to ask the best way to specify charsets > > > when they appear in text format, since summary files will > > > currently be text. > > > > One way might be to use the so-called "preferred MIME name" of a > > charset. See, for example, iso-8859-1 in the charset registry: > > > > ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets > > > > Cata <cata@netscape.com> is working on a module that deals with > > charset names and aliases. > > In the new summary file, will the data be stored as utf-8 with charset > name information attached or actually stored as the charset indicated > by the charset name? I plan not to use utf-8, but neither mdb nor Mork can dictate what kind of content folks ask it to write. So by convention, I had planned that content would be stored in the actual charset, grouped by charset name. (If content can be in charset XYZ, and yet still be encoded in utf-8 or not in utf-8 while still in the charset XYZ, then perhaps I have an inadequate amount of metainformation to distinguish these in Mork.) I was under the impression that utf-8 was a charset, since we use a constant to indicate utf-8 when converting content written to ldif in address book export. Perhaps when folks use utf-8 encoding, they might also want to indicate what charset they wish it really was when used. If so, then I need to extend the mdbYarn content encoding system. Hypothetically, I might add a mYarn_Wish slot to the mdbYarn struct: struct mdbYarn { // buffer with caller space allocation semantics void* mYarn_Buf; // space for holding any binary content mdb_size mYarn_Size; // physical size of Buf in bytes mdb_fill mYarn_Fill; // logical content in Buf in bytes mdb_more mYarn_More; // logical content in Buf in bytes mdb_cscode mYarn_Form; // charset format encoding mdb_cscode mYarn_Wish; // charset user really wants at runtime mdbYarn_mGrow mYarn_Grow; // optional method to grow mYarn_Buf }; However, I will not do this until and unless we have a thorough talk about such issues in the netscape.public.mozilla.mail-news newsgroup. > Where can I find the specification of the new summary file? So far I've been posting incremental design messages in this newsgroup: news:netscape.public.mozilla.mail-news So you can find messages prefixed by "[MDB]" (for mail db stuff) and by "[MORK]" (for the Mork text file syntax and grammar). Later this week (by Thursday?) I plan to put up html docs near here: http://people.netscape.com/davidmc/mdb/ But since http://people.netscape.com/davidmc/ just appeared, I don't even have an index page there yet. And I haven't even started writing an html document on mdb or Mork yet, so I can't do it much faster. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

scripting

From: David McCusker <davidmc@netscape.com> Subject: Re: Menu Spec Draft 0 Date: 13 Jan 1999 00:00:00 GMT Message-ID: <369D21B8.2E583A@netscape.com> Content-Transfer-Encoding: 7bit To: Brendan Eich <brendan@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.xpfe Brendan Eich wrote: [ snip ] > OK, but why do you or I (or a scripter or programmer out there in a > year) care about high-level vs. low-level? What space, time, or other > complexity differences among commands and events motivate having > different syntax or general semantics for commands vs. events? (The general rule is that lower level means more fragile in one or more senses. For example, assembler is lower level than C, and fragility is manifested as relative unportability.) I expect the significance of height for scripting events or commands can depend on the context for recording and playback, if this happens in the usage scenarios for scripts involved. There seems a greater chance of feedback loops in lower level event streams, especially when high level commands post one or more events as part of command execution. In the early 90's I was trying to design a scripting system at Taligent that would permit automated scripting of commands and/or events, and the issue of feedback loops in recorded event streams was a problem when one could not distinguish the user event stream from command posted events. The problem can end up being moot if record and playback is infeasible for one or more reasons, such as general lack of determinism in the event system, as was the case with Taligent's event system at that time. The Taligent document architecture needed to track command streams well in order to cope with sychronizing participants, but little effort was made to rationalize the event streams in a similar manner. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

dlls

From: David McCusker <davidmc@netscape.com> Subject: Re: How many dll's? Date: 08 Feb 1999 00:00:00 GMT Message-ID: <36BF4641.8051B28C@netscape.com> Content-Transfer-Encoding: 7bit To: Phil Peterson <phil@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Another Netscape Collabra Server User Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news I tend to espouse the unpopular view that one can hand-roll object- oriented techniques without the help of a specific language, compiler, or runtime, and that doing this helps avoid dependencies. To a degree my suggestion below might tend to sound like this. It always clashes with the urge some folks feel to follow the One True Way if possible, and this follows directly from my attempt to increase degrees of freedom, while using the One True Way always reduces them. Phil Peterson wrote: [ snip ] > Seems like the primary (but unstated) issue is how we do > implementation inheritance across DLL boundaries. I think this is > probably a requirement, since the mail/news design is full of base > classes whose purpose is to generalize the things we do in local mail, > IMAP, news, etc. [ snip NS_BASE class-oriented technique ] Delegation of some behavior to a pluggable object handling a kind of task works very well. That way one writes code to handle parts needed, and one need merely plug in pointers to things provided elsewhere. It's hard to mess up, since if one has no suitable plug-in at runtime, you can assert and figure out why one failed to be on hand right there. This is only rather awkward when one actually intends to subclass the interface whose implementation comes from somewhere else, because then delegation requires that you add another gratuitous layer of function call overhead that has both extra code footprint and dispatch time. This problem can be partly (or even largely) removed by increasing the granularity of classes, so that each class does more cohesive things in a smaller number of methods. This would tend to have the effect that shared behavior you want across DLLs will less often need subclassing, and that fewer methods need be provided when subclassing is required. The kind of class design that causes a problem tends to sound like this: "You need to provide subclasses for objects that implement all methods in this entire interface." This gives each subclassing DLL the problem of satisfying all the methods, and thus you wish you could satisfy some of them from a base DLL, and avoid actually overriding some methods. The kind of class design that causes fewer such problems sounds like: "You need to provide subclasses for all these methods that need to be overriden, and you also need to handle pluggable objects of this type which you might get from a base DLL, or else subclass at real need." This lets each subclassing DLL share objects comprising common code easily across library boundaries, and lowers total method count cost. > The downside to this, I expect, is that the DLLs which share this > inheritance relationship are very tightly bound. I'm a little concerned > about the version safety issue. I'm not sure how each compiler/platform > expects those functions to be bound: by name, or ordinal, or byte offset, > or what, so I think we should look into that. I think factoring it as above will reduce coupling and loosen binding. Relying on a complex compiler or linking system feels more chancy. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

binding

From: David McCusker <davidmc@netscape.com> Subject: Re: Protocol Dispatching Proposal for 5.0 Date: 02 Jun 1999 00:00:00 GMT Message-ID: <3755A4D4.36F4264B@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com John Friend wrote: [ snip ] > Does your design handle the messy problems that 4.x had with > multi-part/related messsages? I thought this post was just so cool. It's has some of the best examples of hard binding problems I've seen in a long time. Actually I'm inclined to think all hard problems are actually binding problems. :-) I'll define what I mean by a binding problem, and then apply this way of thinking to all of the things you mention with the hope that solutions appear more readily. Maybe at the end I'll comment on how async message passing can be used to solve binding problems with minimal dependencies. A binding is whatever infrastructure supports a reference. In order to refer to anything, a binding for the ref must exist (at least before the ref can resolve, which can happen after the ref begins). A binding is whatever mechanism is used to support the mapping from a ref to whatever value is bound to that ref in the system used to resolve a ref action. If this sounds tautological, then you understand it correctly. A ref is any bit pattern mapped to something else by a binding system. This includes address pointers that map to locations in RAM, names for prefs at runtime, URLs for remote content or embedded objects, path names for files, uids for database rows or tables, etc. (A ref stands for the value to which it binds, and sometimes we obsess on the value and give too little thought to how the act of binding works.) So a binding problem is basically one in which you must figure out how a reference will work, and it can have a lot of variable parameters, which might be expressed as questions similar to these: What does a ref look like? When can a ref be used? How does it get resolved and when? How are invalid refs handled? Can a ref wait for a binding to be generated before resolution? What happens when a ref is ambiguous (so it binds to more than one value)? Who creates the bindings which attach values to refs? How long does each binding live? When, how, and why are bindings ever revoked? Do performance needs constrain how resolving works? How many layers of intermediate bindings are used to resolve a higher level ref? How many times will a ref be used? Does a ref resolve atomically, or as a sequence of transfers which break the value into parts? And so on until you want to puke. Typically forward refs are problematic, and they are endemic in text mediums presenting refs and bindings linearly and in unpredicatable order. Resolving a forward ref requires waiting until a suitable binding is at hand. If a single thread both parses refs and makes all bindings, then the thread must come back later to resolve forward refs. If other threads can be spun to act once a binding exists, then parsing threads need only notify when bindings exist to wake up waiting refs. My following comments only intend to recast the original remarks in terms of binding problem elements that seem relevant or interesting. > A multi-part/related message can end up having text/html parts with > <IMG> tags in it that have funky URLs cid:<MIME PART REFERENCE> that > point to named MIME parts later in the message. Such funky URLs are forward refs which do not have bindings until the later MIME parts are parsed. > Rich or Terry can probably speak better for the details of how this > worked in 4.x, but as I recall it was not particularly easy to make > this work in the old netlib because the named parts which are needed > to satisfy the IMG tag may have the following constraints: Resolving the ref from the IMG tag involves waiting for the binding. > * You may not be able to have random access to the named parts (like > you might with an http IMG URL). In other words, you might have to > wait until you've downloaded most of the rest of the message until > you encounter the right MIME part. This is definiitely the case > with messages in local folders and is the case with dumb IMAP > servers too. An http IMG URL already has a binding at the indicated server. But a URL that forward refs into the same stream containing the ref has no binding until the later portions of the stream are read and interpreted. > * You may not be able to launch another thread to fetch the IMG URL > because the only way to get the data might be by continuing to read > from the current IMAP connection. The first 4.x way of doing this > launched another IMAP connection to go fetch each image (and it > downloaded and parsed the entire message to find each image). Not > only was this very inefficient, but it's also prohibited on some > IMAP server configurations (you can't make more than one connection > to the same folder on some servers). You might unable to pretend a ref is not forward, by resolving the ref on another thread, because the current thread might hold a connection resource which the new thread would have to wait for, causing deadlock. (Deadlock in similar situations can be avoided by having one thread make most bindings used by other threads. Or more generally, a thread that makes bindings should not wait for any other threads to make a binding, so this will prevent cycles in waiting for bindings.) If bindings are made by linear cost computations, a single thread might pay the cost most efficiently, and other threads might wait for any bindings created by the parsing thread. > * You may not be able to resolve any single IMG URL until you've > built a mapping table for all of them and finished reading the > entire message. Bindings might not come into existence incrementally and individually, if the ref system assigns bindings based on entire groups of bindings. So waiting for one binding might involve waiting for all bindings in some larger group. > * All of this mess is nestable since stand-alone message/rfc822 parts > can be nested inside of other message/rfc822 parts. The encoding of bindings can get arbitrarily hairy, which makes it more important to have a single path of binding creation, with multiple users of those bindings waiting in other threads. In practice, waiting threads might send sync or async requests to resolve refs to a queue serviced by a thread which is parsing and making bindings to be consumed by other threads. Sometimes a binding will already exist and can be resolved immediately; other times a wait is required until the binding is found later. Faster performance results with async requests, when a thread using refs knows it will need a ref soon, but can do more work before needing to block for the ref to resolve. > * The user would like incremental display (e.g. not blocking until > everything is downloaded) for these messages since displaying them > has a lot in common with display any old web page with images. A UI placeholder might be presented until a ref resolves, and if layout is incremental, then cost to reflow is not great for late bindings. The presentation of broken refs should differ from refs still resolving. > More food for thought. I apologize in advance if you're way ahead of > me on this one. I can't tell at all whether current designs permit anything I have been describing. It still sounds somewhat idiosyncratic like the old netlib, even if it is now more general. Maybe I'd understand necko better if presented in more universal terms, instead specific classes and methods whose import I can only guess without boatloads of verbose docs. Maybe I'll say something about async message passing for minimizing dependencies in ref resolution later. I'm all written out this moment. David McCusker - for computational values, context is a set of bindings Values have meaning only against the context of a set of relationships.

roundtrip

From: David McCusker <davidmc@netscape.com> Subject: Re: Round-tripping entity references in RDF/XML (was Re: Text entities in RDF/XUL) Date: 09 Jun 1999 00:00:00 GMT Message-ID: <375F1B25.747BD87D@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf I feel impelled to meddle even though I know none of the particulars. Chris Waterson wrote: [ snip ] > You'd have to remember what literals in the graph got generated from what > entities; just maintaining a table of entities and then reverse-mapping > when you want to serialize back is insufficient. Specifically, it breaks > down in the case that >1 entity maps to the same literal string. Okay, I'm really ignorant of the actual parsing code flow you have that builds your in-memory graph, so I have no idea whether this is feasible. But if you could record exactly what was in the input as an annotation on what you really want at runtime, then you could use this for writing later provided that the object is not changed. You can treat the captured form from parse time as a cached version scheduled to be written at serialize time, which becomes invalid whenever the object is modified. You are in trouble if the parsing code is a layer which pre-digests the input for you in such a way that you cannot see the original naked input text. For example, if it was a C language parser giving you two different integers which were originally 0xF and 15, the parser might just give you two binary integers both equal to 0b01111 without telling you whether the originals where different literals. It might not be feasible to find the "0xF" and "15" sequences that were in the input stream. David Mc

waste

From: David McCusker <davidmc@netscape.com> Subject: Optimizing with nonzero fragmentation Date: 14 Jul 1999 00:00:00 GMT Message-ID: <378D0EC7.4962CE97@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf,netscape.public.mozilla.performance Here's the executive summary, in case you don't read all this: never optimize either space or time exclusively at all costs, since this wastes your resources in diminishing returns when you could have put a small percentage effort in another area for some large returns. This material was inspired by the notion of wasting space in btree index nodes (news://news.mozilla.org/378A7682.5987C4A4@geocast.com). This explains how to do space/time tradeoffs for best performance. I do not address the other half of optimization here (which is, do less whenever possible). This half of optimization might be called "spending resources wisely by investing in a mixed portfolio". I might have used many different equivalent subjects for this post: "Why you simply must 'waste' space to have faster time performance", "Why you must balance space and time fragmentation for optimal code", or maybe, "Some fragmentation is good for you, in small doses". Space is optimized by using all of it constructively with no waste. You waste no space when there are no stray padding bytes as filler. Wasted filler bytes are sometimes called "internal fragmentation", and "external fragmentation" means wasted surround filler space. Time is optimized by reading or writing all your content in one contiguous shot, boom, without seeking discontinuities elsewhere. You waste no time when you move all your content in one screaming system call that touches your storage medium. When content is fragmented into pieces requiring multiple access calls, this too is unfortunately called "fragmentation", but with no relation at all to the the spatial wastage fragmentation mentioned above. When content is discontiguous, it fragments locality of reference, which maps directly into fragmentation of time to access disparate locations, so we might was well call this time fragmentation: Ft. When content packs less than perfectly in preallocated records and wastes some space internally or externally, we might considate both these notions of space fragmentation and call it this: Fs. So Ft wastes your time resources and Fs wastes your space resources. And combined resource waste might be called Fc, where Fr = Ft + Fs. So why would you ever want Ft or Fs to be nonzero, deliberately? Obviously, when it much reduces the other and Fc is much smaller. This situation usually obtains in any db in which you modify parts on a frequent, regular basis, adding some content or cutting some. Such frequent modification introduces fragmentation entropy into a db, and the effect becomes much more pronounced in proportion to how perfectly either Fs or Ft was optimized down to zero. Having a zero in either Fs or Ft means there is no flexibility at all, so any change incurs a largish amount of work to update a changed db. But deliberately having nonzero Fs and Ft will cause changes to be absorbed with less work, and with less resulting havoc in db form, because the latent flexibility can allow minimalist repacking. This matters most when you require that the general order of db performance stay about the same over the db lifetime, despite any ongoing thrash from incremental content editing. The flexibility of coping with nonzero Fs and Ft at the very start will increase costs a little at the first, but will reduce cumulative costs as the db gets older is modified either significantly or frequently. Okay, now I think you get it, and I'll just stop abruptly here. David Mc

weakref

From: David McCusker <davidmc@netscape.com> Subject: strong vs. weak refs (was Fwd: Why we leak: a prelude) Date: 26 Apr 1999 00:00:00 GMT Message-ID: <3724E6AD.5F7482C5@netscape.com> Content-Transfer-Encoding: 7bit References: <3724BA9A.9C47AB18@netscape.com> To: pinkerton@netscape.com Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.general,netscape.public.mozilla.xpcom,netscape.public.mozilla.layout scc@netscape.com (Scott Collins) wrote: > Here is a starter set of simple and obvious guidelines. Our code > violates these guidelines everywhere. > > (1a1) Parents own their children; > > (1a2) Children do _not_ own their parents; > > (1a3) You don't need to own an object whose lifetime is guaranteed > to be longer than yours; > > (1a4) You don't own an object because _you_ need _it_; you own it > because _it_ needs _you_ > > (1a5) You don't want to own an object that (even transitively) owns > you As a tool for implementing a robust ownership model, simple refcounting is not sufficient in real (i.e. complex) applications unless one can distinguish between strong and weak references. I can explain these rather well in terms of the 1a* bullets in Scott Collins' message. A weak ref is for memory management purposes, so a pointer to an object does not go away and leave a dangling pointer. But a weak ref does not force an object to stay open for business and usable. A strong ref is a weak ref plus a use ref, so the weak ref keeps an object from being collected, and the use ref keeps the object from being closed. The idea of 'use' is a generalization of the idea of owership, where counting owners permits more than one owner to keep an object open. To support weak and strong refs, one possible implementation model keeps a use count seperate from a ref count, where uses must never become any fewer than refs. When strong refs are released, uses are always dropped first before refs, so an object will always finish closing with a nonzero ref count. This prevents the possibility of self-deletion while closing. Strong ref graphs must be acyclic, so that no cycles exist that stop objects from closing when owners are finished using them. Closing an object causes it to release ALL references to other objects, both weak and strong. But closing an object does not cause it to be deleted. Weak ref graphs are allowed to be cyclic, and the cycles do not prevent garbage collection in practice because the closing of objects when strong refs hit zero will cause all refs including weak ones to be released, and this causes weak graphs to come undone at exactly the rate needed when owners of objects finish using them. Parents typically own children with a strong ref. When a child must know about it's parent (or any ancestor at all), such refs must be weak. A weak ref is typically a backpointer to some driving object, and they often create reference cycles, which is okay as long as the strong refs alone create no cycles, since weak refs never stop objects from closing. Currently all the refcounting in our code amounts to strong references alone, and this is a problem for more than one reason. First, any cycles will create serious leaks. Second, it is hard to remove all cycles from the strong ref graph, because complex apps usually need some backpointers if only for performance reasons. Third, releasing refs from destructors causes a complex dependency between closing and collection. I expect our code to keep having leaks until we start using weak refs in addition to strong refs. And unless we drop the use counts before ref counts, we will have self-deletion problems when closing objects. The most annoying thing about the weak and strong ref discipline is that one must create a polymorphic close method that obviates the need for C++ destructors, since by the time an object is destroyed it had better be already closed, in which case there's nothing to do in the destructor. So a destructor can only useful assert that close has already happened. The weak and strong ref discipline is not just theory, because I've used it before, and it is currently being used by Mork in the mail/news tree. David McCusker, making a little difference (iota inside(tm)) Values have meaning only against the context of a set of relationships.

refs

From: David McCusker <davidmc@netscape.com> Subject: [REFS] exposing ref semantics in mail/news user interfaces Date: 05 Jan 1999 00:00:00 GMT Message-ID: <36928514.6352AC33@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Let's discuss how users will see data permanence and identity in the various user interface elements for mail and new, etc. This message aims to seed this topic so we gather fuel for internal meetings to talk about this issue. We briefly considered having meetings first and then talking about it here later. But starting here is just about as good. Right now I'm only trying to find context and ask some of the relevant questions. This relates to my MDB post before lunch on gc & refcounts, but the definition of MDB should not drive the user model. But it must be possible to map a user model onto a MDB definition, which can change. First some brief context, which other folks can elaborate if they wish. We want a progressive UI which is able to show content in flexible ways, which will include the ability to show the same object in more than one view. But we need to define exactly what this means to a user, and that is the topic of this thread, for which we want constructive arguments. Here's the canonical question: when a user deletes an object, what does that mean exactly? Remove this one alias to the object from this view? Or search and destroy all aliases to this object, wherever they are? Or some combination of these two that is either context sensitive, user- controlled, or both? What factors affect answers to such questions? A natural conflict making such questions hard to answer is the tendency on an engineer's part to use refcounting for good definition, against the tendency on a user's part to not really understand alias semantics, which can be more confusing than a model where delete always means destroy. I hope this is enough to start a discussion. Please feel free to ask your own questions, or shift the general agenda in a direction you feel is more appropriate to addressing such problems. Don't feel bound by the way I started the presentation of this material. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

refcount

From: David McCusker <davidmc@netscape.com> Subject: hard-vs-soft ref scheme (was Re: [xpcom] dealing with cycles) Date: 28 Jul 1999 00:00:00 GMT Message-ID: <379F8878.789FA823@netscape.com> Content-Transfer-Encoding: 7bit To: John Bandhauer <jband@netscape.com>, Scott Collins <scc@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.xpcom This is a response to Scott's post, though you'd never be able to tell without scrolling down really far, until you see the attributed quote. I added "hard-vs-soft ref scheme" to the subject to rename the scheme I outlined in my April posting, so our use of weak ref terminology is somewhat less overloaded and confusing. So when you talk about my scheme, you can say "hard ref" to mean strong ref, and "soft ref" to mean weak ref, where all these are never dangling pointers, and the practical effect is the same as split identity, but much less complex. (Yes, the main reason for this choice of terms is that "hard" has only four letters, compared to the six in "strong".) Incidentally, the soft and hard ref terms are the ones I use in my unpublished Mithril dynamic language (to be public domain), as opposed to the weak and strong ref terms used by IronDoc. The hard/soft ref scheme I use in Mork comes from my other projects. I figured out this system as the result of analyzing why OpenDoc runtime refs were tepid, as a generalization of weak/strong refs in OpenDoc's storage system. And Mithril is a dynamic language with a generalized foreign object interface for wrapping components defined elsewhere (where IronDoc persistent objects are canonical examples). So my Mithril/OpenDoc design context resembles jband's JavaScript/Mozilla memory management context with regard to ref problems, which accounts for our harmony. Thanks, jband, I was ecstatic that someone read and understood this: [4] news://news.mozilla.org/3724E6AD.5F7482C5%40netscape.com "strong vs. weak refs (was Fwd: Why we leak: a prelude)" So you really made my day when I knew somebody got it. Whenever I try to convey what I know about memory management, gc, and refcounting to other folks, I feel like Clint Eastwood in that one thriller with John Malkovich, where Clint says to Rene Rousseu, "I know things about pigeons, Lilly." I'm too ego-shy and reserved to intimidate other folks with an overt authoritarian attitude. :-) A few months back, I explained my hard/soft ref scheme to some folks around here in mailnews, and the reception was noncommital. I was asked in which book folks should read to find the description of my scheme; I told them there was no book since I invented it myself, and this seemed to have the effect of robbing my scheme of any authority. (I'm not saying the COM designers were idiots. Just that I have no reason to suppose anyone else knows more about practical issues in garbage collection and refcounting that I do; lese majesty again.) I don't have any argument with Scott Collins, and I reply to his post only because I see the meaning of my own scheme getting totally lost, so I want to find where my hard/soft scheme fits in material below. But first let me recap what my hard/soft ref scheme does generally. The main point is summarized by jband as follows: [state 1] non-existence -> [state 2] existing and holding references on other objects -> [state 3] existing but holding no references -> [state 1] non-existence The new state is the third one, "existing but holding no references", and is called "closed" in Mork, or "halted" or "stopped" in Mithril. The addition of the halted state has the effect of uncoupling a need to release refs from a need to memory manage an object's storage. So the word 'uncouple' is very related to 'split' in "split identity". The very applicable metaphors that should come to mind are "lattices", "potential energy", and "acyclic digraphs". The idea is that cutting refs, either soft or hard, will cause objects to reach lower energy states, and it is unnecessary to go uphill at any point in order to make further progress downhill, so cycles cannot cause a hangup. This is why graphs come apart cleanly in my hard/soft ref scheme. COM uses split identity to do this, but my hard/soft ref scheme does not, so my scheme is simpler in some ways. (At this point, you can assume I think COM designers are idiots with regard to complexity.) The point of the word 'split' in "split identity" is that factoring can associate ref-holding with one identity, and existence longevity with another identity. This uncoupling was done by splitting one object's identity (an address pointer) into to object identities (a pointer for each of them). But it's not necessary to do that. My hard/soft ref scheme achieves exactly the same factoring without splitting the identity, because I just factored it differently, by giving each object two kinds of behavior instead of two kinds of object identity. One behavior is 'closing' (or halting) when uses hit zero, and another behavior is deleting when refs hit zero. Since the delete methods are usually the same, code size does not explode. Here's how this interface relates to COM methods, with pseudocode: addref -> AddHardRef() { ++u; ++r; } // increment uses, refs release -> CutHardRef() { if (!--u) close(); if (!--r) delete(); } (??) -> AddSoftRef() { ++r; } // increment refs alone (??) -> CutSoftRef() { if (r <= u) yell(); if (!--r) delete(); } This protocol is astonishingly simple; only implications are complex. The psuedocode shown hides some carefully done error handling. However, when an object is distanced from callers with a wrapper (and this happens in Mork) then the correct use of hard and soft refs can seem complex, and the breaking of relationships involved non-obvious. But it can be reasoned out directly from plain principles with effort. Scott Collins wrote: [ big snip ] > >When you find a place where a weak reference is required then you > >should look to split identity solutions first and out of band > >solutions next. > The three levels of solution: raw pointers, out-of-band signalling, > and split identity each have their applications. It seems like you left out my hard/soft ref scheme, unless you replace "split identity" with "factoring uses away from refs" in order to get the same number of finalizing states handled by split identity objects. > Raw pointers are often appropriate in simple cases as I demonstrated > above. If the case where a caller passes in a ref, then under the assumption that ownership (ie. a hard ref use) is not passed down into callers, then a caller necessarily holds a ref to parameters across a call. Then callees need only add their own refs at need when an object will be held longer than a method call. Other than the case where the caller is required to hold a ref for the duration of a call, I don't think raw pointers should ever be used, except for objects which are going to be deliberately leaked by normal convention, and this also happens in Mork. Env and heap objects are used in such a widespread fashion, and there are so few of them, that if makes sense for efficiency to leak them always and not refcount. > In many real-life situations, particularly as we grow into the > apartment model or exploit DCOM, split identity will be the only > workable solution for relationships that cross process or apartment > boundaries. The hard/soft ref scheme is also workable, and I think preferrable. > Out-of-band solutions are non-obvious, not automatic, and in general > fragile, but they have their uses. I don't have much to say about this, except that I think refcounting rules of order should pretend that refcounting is the only mechanism, and ignore any out-of-band stuff going on. So that would have make out-of-band techniques an ad hoc approach to resolving compex issues outside the scope of refcounting. But out-of-band techniques are usually virtually required for good performance in patterns like collection classes, or minimizing space footprint for large populations. So in those cases one has to either define exactly how refcounting is handled, or specify that refcounting occurs not all all or else in a privately modified local system. David Mc

tries

From: David McCusker <davidmc@netscape.com> Subject: [trie] Mithril minimal space trie design Date: 04 Aug 1999 00:00:00 GMT Message-ID: <37A8BB91.B0CC6933@netscape.com> Content-Transfer-Encoding: 7bit To: Chris Waterson <waterson@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news,netscape.public.mozilla.performance [trie] Mithril minimal space trie design Working on my public domain Mithril language last weekend, I spent a few hours thinking about minimal space trie designs, in case this might help me shrink the space footprint for string->symbol interning tables. I had what might be a really good idea for trie designs, so I'll describe it here since folks can code something similar for mozilla, for use in RDF, or layout, or the Mork database used by mailnews. Or maybe after the first beta, if I have already written the public domain trie in Mithril by then, I might port this code into Mork to optimize space footprint. (At some later time I will post some public explanations for why I have written permission to keep working on Mithril and IronDoc on my own time, and to continue dedicating source code for these to the public domain.) background A couple weeks ago, Chris Waterson described some weekend research he was doing to examine the use of trie's to compress mappings that use strings as keys, since this would very useful when strongs tend to have long common prefixes, as would happen with closely related URLs. In an offline discussion, I mused over the possibility of using both trie's and hash tables, where a trie could represent strings in small amounts of space in order to make a shared atom table, and then many different hash tables could use the interned atoms in that table for keys in hash tables, so the cost for keys happens in only one place. If you want to know what a trie is, look in a data structures book; basically you use trees to represent all strings with common prefixes, so you try to represent distinct prefixes only once. (This is a gamble on space footprint, since the node representation of child subtrees has some overhead, and you could lose if the tree was extremely full in population, so nearly every prefix had many children.) Here is one illustrative trie example in Lisp syntax, which is very effective since Lisp lists basically form tree structures. Suppose you want to put the words {gee, gosh, golly, gold} into a trie, so the common prefixes are shared: (g (ee) (o (sh) (l (ly) (d)))) This means the same thing as this: (g (ee) (o (sh) (l (ly) (d))) or this: g -- ee \ o -- sh \ l -- ly \ d Note I have not bothered to illustrate how to deal with the issue of showing that prefixes are themselves member words in their own right. For example, suppose we added "gol" to this tree; how would we show a tree leaf indicating we have a word after the first 'l'? Common practice says to add a special marker symbol at that point in a tree. This is all you need to know about trie data structures to work on coding or design. There's no reason to read a book, and I didn't, so this background is the basis of the design I did last weekend. more constraints I added more constraints to the design problem to satisfy the actual usage scenarios I knew would obtain in practice, when using trie's to represent symbol tables or atom tables. The traditional trie design only intends to answer a query about whether a test word is considered a member of the trie. Such a trie is not adequate to represent either a symbol or atom, where one must also work the mapping backwards to recover the original string from the interned atom. A string atom is not very useful if you cannot show what string the atom represents. A traditional trie design generally uses only downward pointers, so one can navigate only downward to the leaves. But once at a leaf, one cannot go back up toward the root. But if one uses a trie to build a intern mapping table for atoms, then the leaf atoms must be able to traverse the path backwards to find all the bytes composing a string. This means all the trie nodes will need backpointers to parent nodes. This dramatically increases space footprint cost, unless you figure out some way to optimize inter-node pointers using some global knowledge about how the trie is organized. (Okay, so I am telegraphing some of the design in advance right here.) design I won't present a complete design -- just the interesting parts. I also won't present all the brainstorming approaches I rejected for one or another reason (good design often involves throwing away 95% of the ideas that occur to you while designing). So this design only uses two main strategies, although there are other interesting strategies I considered. For example, a smallest possible encoding I considered took too much time to build, because it performed O(N^2) mem copies. I settled on a really interesting design that doesn't do an awful lot of copying as the trie gets bigger and bigger. So this design ends up being a complex game on heap representations to shrink the size of address pointers used for inter-node references. (This design is going to remind you of dynamic language runtime designs if you are familiar with such things. In particular, the way a heap is organized will very strongly resemble a Smalltalk-80 style object table, which uses 16-bit 'pointers' to a max of 65K (2^16) objects, where the objects can physically move without changing identity.) The two main strategies used in this design to save space footprint are: 1) allocate fewer blocks of memory to cut per-block management overhead 2) reduce the size of inter-node pointers to cut per-node overhead You can factor this design so that all the interesting strategic action is done by a heap class used by the trie, so all the magic happens in the memory allocator(s) used, instead of in the trie itself. Except that this only works, and scales up to lots of content as well, when the way a trie is organized is taken into account. In order to make inter-node pointers small, it is necessary to have strong constraints on what other nodes are reachable in the graph being built. In the trie we will build, we know we only want pointers in between parent and child nodes, in both directions. This tells us that the scope of a pointer reference is generally constrained to a subtree within a trie, so we can use 16-bit 'pointers' if we can figure out how to constrain the population of a subtree to only about 65K nodes. This constraint is easy to satisfy if we bust up a subtree into many heaps whenever a single heap overflows its maximum capacity. (This overflow case is the main incidence of copy-related scaling cost.) If a subtree becomes so large it overflows, this means the cost of turning one heap into many heaps is not very significant in terms of per-heap space overhead, because the overflowing heap was already using memory on a pretty big scale, so the cost of a hundred heaps is not so great. Remember that we don't have to rebuild the entire trie when a heap overflows it's maximum block count. We only have to rebuild a subtree that overflows, since only the heap for that subtree must be changed. To start with, a trie might have 256 subtrees for each possible leading byte for a string being interned. One might use a heap for each. If one only creates a subtree and associated heap on demand, using lazy coding, then we only incur a heap when we actually start a subtree. However, since we already have a well-defined behavior for handling the overflow of a heap, we could start with the entire trie in one heap, and then overflow as soon as we exceed the 65K block limit. This would put all trie content under one subtree rooted by the empty string. This is a good idea if a trie ends up being small, but a waste of time if the trie ends up being very large. So maybe this choice should be made when a trie is created, using a caller guess about eventual size. The strategy to allocate larger blocks to cut per-block overhead leads to tactics in both heap allocations and trie node construction. The specialized heap class we use for the trie should only allocate rather large blocks itself (from a plain vanila heap like malloc() or new()), and then suballocated from this blocks to satisfy all trie requests. Our heap class can afford to round up block sizes to, say, multiples of 4K, and then keep (say) 32 different free lists for blocks in sizes from 4K to 128K, as an example. If we knew our trie was always going to hold any awful lot of strings, so fragmentation due to a single last block being almost empty, then we might just allocate very big blocks. Our trie mapping only adds new strings, and never cuts them. Since content is never removed, the trie never frees a block it allocates from the heap. So free lists are not needed to accomodate the trie; a heap only uses free lists itself when it moves things around to fill requests from the trie. So why does the heap move things around? The answer is related to a trie tactic to use bigger blocks, by trying to allocate all of a trie node in a single piece of memory. A node has a variable number of children, and this number tends to grow as more strings are interned and the tree splits and branches ever more. But the trie can still put a node's string content, all its metainfo, and all its refs to children nodes in a single block of memory. What happens when a trie adds a new child to an existing node? The trie asks the heap to make the block bigger. And because the heap is using an object table, which gives stretchy sized blocks without any change in block identity, the heap can always grow a block <emphasis>without ever changing the 'pointer' to the block used by the trie</emphasis>. When the heap grows blocks this way, it will tend to suballocate new blocks out of its larger stockpile of space, and put the old blocks in free lists of standard block sizes. These free lists are different ones from those mentioned above, since these lists are of suballocations in bigger blocks, while the lists above are all blocks from other heaps. When a block moves in this fashion, as the heap swaps old content into a new bigger space, the heap updates the pointer to this block in the object table, so the trie's object id now refer to the new block via the indirection of the object table. Because trie nodes can grow in size as child node refs are added, and still keep the same 'pointer', it will never be necessary to update any backpointer refs from child nodes to parents, because parent nodes always have the same identity. Assuming a 32-bit CPU architecture, each heap has a vector of 32-bit pointers to blocks allocated by that heap. This vector is the 'object table' for that heap. Since this vector will never have more than 2^16 slots in the array, an unsigned integer of 16-bits will always be big enough to identify a block in each heap. To scale slowly in space footprint used, an object table should start smaller than full-sized, and then grow geometrically in a fashion similar to hash table growth, so no more than O(log(N)) object table copies will occur over time. The heap need only allocate blocks with two-byte alignment, because the trie will typically never need to use integer values larger than two bytes in size, when block IDs are 16-bits and the trie can arrange that node content never exceeds 65K of bytes nor node, and node child refs need never come anywhere near 65K in practice. The trie will always know how many bytes are in a block because it knows the size of all content put in the block, so often the heap need not know this info for maintaining free lists, if the trie is responsible for providing this info when growing a node block. If we ignore space footprint associated with blocks in heap free lists, then the per-block space overhead is typically only four bytes, to put the block's address into the object table vector. This compares well with space overhead for conventional memory heap allocators. Except where we transition over heap boundaries, typically at the roots of subtrees bound to different heaps, the size of a typical pointer to a node will be only 16-bits, both from parent nodes and child nodes. So this controls the space overhead to build trees that branch very frequently every few bytes in the interned strings. (Otherwise the tree structure could dominate the space cost of strings themselves.) Nodes will typically come in many flavors -- at least as many as are needed to distinguish how a node is internally structured, so a trie can tell when transitions between heaps happen, and to tell leaf nodes apart from inner nodes, when this permits a smaller leaf node encoding. Typically leaf nodes in the trie will be 'atoms'. The main behavior of a trie is to take a string and intern it in the trie so a new entry is added that maps the string to the uniquely associated atom that denotes that string value. Clients of a trie can use the atom pointers as cheap strings which need never be memory managed, as long as the trie lives longer than all the clients. Further, all the atoms will have uniquely distinguishable addresses, so they can be used as keys in hash tables, so one can create hash tables with string keys, and pay no cost for strings beyound the copy found in the path within a trie. So a client has a pointer to an atom. What's in the atom? And how is the string value recovered from the atom. Each atom must include at least 4 bytes of information: a 2-byte node ref and a 2-byte heap ref. The node ref is a 16-bit 'pointer' to a parent node in the trie, so the string path can be traversed backwards to reconstruct the actual byte content in the string. (Note this is slower than having all the bytes together -- can't have everything.) The 2-byte heap ref is something have not described yet. Each trie should have its own object table, with up to 2^16 slots, to map all the heaps that are being used by the trie. Since every leaf atom must ref a heap context to give meaning to the parent node 'pointer', then we are very concerned about making the heap ID 2 bytes in size, and not 4. Then that begs the question of how the 2-byte heap ID is interpreted, when it does not directly point at the runtime heap object. The answer is that the trie being used is always implied as one of the out-of-band invariants wherever atoms are being used. Any collection that holds atoms should know which trie is being used for the set of atoms. So the trie pointer plus an atom pointer together provide everything. An atom does not need more info, but you might want to put more info in any atom anyway. A hash value for the string would be very useful, for example. And if a byte code was needed to tell an atom node from other kinds of trie nodes, then one might want to put a 24-bit hash value next to that for best use of the atom space footprint. That's all I have to say about trie's right now. I'll add more later if I remember something important I forgot to mention. David Mc Values have meaning only against the context of a set of relationships.

overflow

From: David McCusker <davidmc@netscape.com> Subject: Re: handling outrageous documents Date: 18 Aug 1999 00:00:00 GMT Message-ID: <37BB6D47.82975222@netscape.com> Content-Transfer-Encoding: 7bit References: <37B205C3.353291D0@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.layout "Kipp E.B. Hickman" wrote: > In the current source tree, there are no limits on the depth of > the gecko content model and consequently on the frame tree. For the last ten years, my standard question to ask interview candidates to judge their technical competence and flexibility has been one about how to control stack overflow conditions. :-) So I can't resist trying to say something constructive here. (The basic question involves determining when the stack is "too deep". Incidentally, it was a Linda engineer who gave the most comprehensive and fastest response, boom, boom, boom, covering all the basic approaches about as fast as he could speak. Alas, that one got away.) > Because layout is a recursive algorithm there is a direct relationship > between a platforms stack depth and the depth of content tree that can > be reflowed. [A detail: the block frame code is notorious for how much > stack it consumes; I have fixes for that problem in my tree]. There are several ways to factor, but you make it clear you are time bound in considering code re-org options. I assume you fixed the block frame code by having it allocate on the heap instead; this approach can be generalized many ways, even if you stay recursive and generally stack-based in allocating space. Just try to change the local parameters into heap references. At the extreme, you can group all your local variables into a struct and allocate that on the heap (maybe a pooled free list for speed, of course), so each of your stack frames only incurs the cost of a pointer to such a struct. Your compiler helps rewrite the methods, since your local variables become undefined in their original form. That allows you to nest much more deeply before getting into trouble, but you still have to do something when the stack gets exhausted. > The problem comes in defining exactly what we should do with documents > that push the limits. One of the options that we do not have is changing > the fundamental reflow approach to an iterative algorithm. That's too bad; I'm biting my knuckle to stop talking about that. There's a hybrid approach you could use that makes things still look like function calls, but actually gives you an iterative algorithm. If you want to see a description in CS literature, look for the word 'trampoline' and you'll see the kinds of mechanisms involved. I'm using a similar technique in my public domain Mithril runtime, so the C/C++ stack does not actually nest when my runtime and dynamic language environment re-enter each other with arbitrary nesting. If you want some theory, or if some will help rationalize this, you need to think about abstract representations of continuations. But this approach tends to really tear up C++ code in way that might give hives to somebody dedicated to pure C++ style. Here's a short primer, which you might only apply in the serious bottlenecks that cause the most nesting, in case you want to isolate this kind of thing to as small a place as possible. If you want to call another function without winding the stack, you instead return with a code saying "call this other function with these parameters, and then return to me afterwards, and pass me this value to tell me where I should resume, which happens to be at the point after my code where I wanted to make the call". In other words, you multiplex your return code to mean a lot of things, so that only a special return code really means 'return'. Then you have the leeway to have many other return codes do things in a lot of different ways: call with later return, or to a tail call, etc. Basically, all the ways you can code for dispatch in a virtual machine can be expressed in your encodings of C++ method return values. If this sounds like gibberish, then you really don't want to do this. The real advantage to this system is that you can do VM style multi- threading across your C++ code, because you don't actually use a C++ runtime stack, so there is no stack management to handle for threads (except for the one you are actually using to track dispatching). But it also ruins your C++ debugger which won't understand this. > Proposal #1: Change the various content-sink's (the objects responsible > for mapping parser output into content objects) to limit the maximum > depth of a content model. The sink's will discard any new > "container" content objects that are added to the content tree. The > result will be a mangled document, but at least the CPU stack will not > be over-written. Major flaw: DOM access to the tree can create a tree > that's too deep. If you want to strangle the recursion when it gets too deep, counting the depth is not as effective as measuring it directly, by comparing the address of a local variable to some other address that is taken to mean someplace-near-the-top-of-my-stack. The address of a local inside main() works well. If you have threads with their own stacks, then you want the thread model to provide such an address. All your functions which nest, or maybe those which are the target of nesting calls, should perform a depth check and error return. If you have dangerous sudden increases of depth near your limit (and I can't tell if this happens with the DOM access you mention), then you should reserve a large buffer zone of safety in your check, which means you underestimate the size of your stack in checks to avoid coming close. > Proposal #2: Change layout (e.g. the block/inline algorithms) so that it > keeps track of how deep the frame tree is nested and once a threshold is > reached, refuse to go deeper. The block/inline frame classes are an > opportune place to check for outrageous documents because they end up > being used in all deeply nested frame trees. Minor flaw: while the logic > to avoid reflow based on a depth can be factored out, its would require > sprinkling the call sites around into the appropriate container classes. > New container classes might not be implemented correctly. It's a good idea to have a method your plugins can call to answer the question about whether the stack is too deep, so that correctly done container classes can also be safe. (I have an abstraction similar to this in my public domain IronDoc system.) Or you can have all methods called by the container classes do such checks, so your code will throttle, provided it is called ubiquitously enough in containers. > Proposal #3: Change the frame construction code to keep track of how > deep the tree is and make it refuse to create frames beyond a certain > depth. Major flaw: complexity would be high because the frame > construction code has to guarantee a well-formed frame tree; in > addition, the incremental pathways would require care to either create > frames that were not created last time around (because the tree just > became shallower), or to avoid creating new frames because the depth is > too great. Isn't the frame construction code written so it will fail gracefully if errors occur in the methods called by this code? Also checking for the stack being too deep might fit into the code flow for existing paths that handle failures currently. Maybe a too-deep condition can cause you to add a frame that will act as a 'bottom' that does nothing, will nest no more deeply, and can act as a black hole to consume content that gets added to it. Then you can have a node in your data structure which actively prevents further nesting when code tries to use it. That might work better than avoiding a frame, and having to keep working to stop a frame from being created across all possible paths that hit that point. I hope this makes sense even though I'm reasoning a bit blindly. David Mc

iterators

From: David McCusker <davidmc@netscape.com> Subject: [performance] iterators vs random access Date: 02 Sep 1999 00:00:00 GMT Message-ID: <37CEEC3C.A867D763@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news,netscape.public.mozilla.rdf Chris is a busy man, so I'll describe one more of the list of items we wrote on his whiteboard. This one concerns the use of hedging to cope with lack of good time order random access iterators. I asked Chris about random access, and it seems unlikely to go in now (in the short term), and further it would require some layout recoding in order to take full advantage. So we would not win outright just by their existence. However, we might get time similar to the performance we want from good random access, based on what we know about usage. Suppose layout is in loop that looks like the following pseudo code: for (int i = 0; i < 10000; i++ ) this->display(this->elementAt(i)); What's wrong with this? Well, the layout code is performing a linear iteration <em>but it's not telling RDF it intends to do this</em>. In order for code to perform very well, typically you must provide info about all your intentions, so algorithms can target expected usage. In this case a random access API is being used to perform a linear scan, and it never tells the callee this is the intention. However, if we know this loop exists, we can infer the intention and just code according to our knowledge that linear progress is likely. So the hedging strategy here involves coding the iterator you wish that layout was using instead, and quietly using this linear scan iterator as long as it happens that actual usage is in fact a linear scan. So you write an iterator class that behaves best when you start at the first member and keep calling Next() to access the following member. And for each member returned, you keep track of the position of this member from the perspective of a random access linear scan. Then as long as the caller of elementAt() keeps passing an integer denoting the very next member the iterator will return, you call Next() instead of performing a more complex random access for an arbitrary integer. For example, suppose the data structure was the kind of btree I talked about yesterday, where you can do random access in O(log n) time. It would be faster to use an iterator that walked the tree, so getting the next leaf member happened in O(1) time instead. You can use this O(1) walking iterator successfully as long as the caller continues to pass in an integer position denoting the next leaf in the walk. You need only perform an O(log n) seek when the integer is nowhere near. David Mc

pooling

From: David McCusker <davidmc@netscape.com> Subject: pools and fragmentation Date: 07 Sep 1999 00:00:00 GMT Message-ID: <37D5959F.A13DD35D@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.performance I just wrote an offline email about whether fragmentation is a concern when pooling memory allocations as I suggested in order to address the time sinks observed in Chris Waterson's posted performance data. Pooling should be the exception and not the rule, and is best driven by performance data like the kind that was gathered. Then once you have a candidate for pooling, you must evaluate whether you might suffer from the ill effects of any expected fragmentation. The two criteria to consider are total population size, and expected lifetime of the objects with pooled allocation. Both these affect how much large an impact fragmentation might have. Basically, the less that becomes logically free, the less fragmentation cost. If the population size is expected to be very small, such as with RDF enumerators, then fragmentation is no concern because there is just not very much memory involved anyway. If most of your pooled content stays in use until the entire pool is collected en masse, then there is little space actually free that will cause a fragmentation cost. So when displaying readonly documents, or your average writeable document which is seldom trimmed back in terms of contained content, then fragmentation concerns while the document is open are not very great. Pooling should be done in a scope that comes and goes, and does not stay alive for the entire life of an application session. So we would not want RDF to pool globally, but rather within the scope of a source in use, so that going onto other sources would tend to free pools. If one cannot identify when a pool will go away, then maybe a pool should not be used since one cannot describe how fragmentation is avoided. David Mc

arrays

From: David McCusker <davidmc@netscape.com> Subject: DB array access for UIs? Date: 13 Mar 1998 00:00:00 GMT Message-ID: <35099922.7555A89A@netscape.com> Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.general The context for this post is previous discussions about replacing DBs used in Mozilla with other (i.e. free) alternatives. I have a question about Berkeley DB which also applies to other db candidates. I want to know whether array style access (with an ordinal integer position index) is supported directly and efficiently (logB(N) disk seeks) in indexes. IronDoc has the array style access feature designed into its btree dicts. This is supported by having every pointer to a child node accompanied by a count of the number of entries reachable after going through a node. It easy from this information to infer the technique to navigate to any entry at a given array position, or to efficiently compute the array index of any entry found by means of searching a btree index using a search key. The significance of this feature is that it enables user interfaces which rely on scrolling views to very efficiently display content in a database with minimal time latency, regardless of the size of a given database. (The performance cost of maintaining leaf entry counts in nodes is quite small since an index under construction typically has nodes near the index root both dirty and cached in memory anyway.) A lot of my time working on address books has been consumed by retrofitting this feature into -????- by hacking the sources to support array access. It would be a shame if we end up regressing to a database that has access time proportional to the size of the database content in order to have a UI that also correctly supports scrolling views in the frontend interface. Unless a replacement database system supports fast native array access in btree dictionaries, it will be hard to build, open, or show content in near instant time using user interfaces with scrolling views on large databases. This is essentially the main scalability issue involved in db replacements. David McCusker, wanting near zero seek latency in large address book dbs Values have meaning only against the context of a set of relationships.

transactions

From: David McCusker <davidmc@netscape.com> Subject: Re: Sleepycat DB 2.0 Date: 27 Mar 1998 00:00:00 GMT Message-ID: <351C2977.D7F676AE@netscape.com> Content-Transfer-Encoding: 7bit References: <351B88D7.72A9@ibm.net> <351BFEBA.799022DA@netscape.com> X-Priority: 3 (Normal) Mime-Version: 1.0 Reply-To: davidmc@netscape.com To: "Frederick G.M. Roeber" <roeber@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Newsgroups: netscape.public.mozilla.general Frederick G.M. Roeber wrote: > The DB 2.0 code, from Sleepycat, is pretty cool. Our mail+news > guys pooh-poohed it for some reason, but I like it. I think David Bienvenu was the main one to voice a negative comment. SleepyCat DB was criticized for not supporting multiple btree indexes in the same file. Of course I also had a long laundry list of concerns, but I'm not the thumbs up-or-down guy. (But I'm terribly biased since I want to use my own unfinished public domain IronDoc database.) > It certainly would be great for the "other" databases -- cache, > history, address book, etc. I don't agree on address books, unless you want the format to be static. > One nice advantage is that it is multi-process safe: It's not hard to add multi-process safety. (I hope someone attacks me on this. I enjoy dialogues so much more than monologues. That's why I chronically strike weaker stances that I might otherwise. :-) > if Mozilla used it, finally you'd be able to run more than one copy > without your databases getting corrupted. It also has (optional) > transactions, etc. for any complicated stuff. It has logging, for > db recovery. Transactions are not optional for address books. We'd get really angry enterprise clients if they lost big address books. IronDoc does not use logging for db recovery. Instead IronDoc uses the simpler and equally safe technique of block shadowing, to avoid modifying old content until new content is atomically committed. The reasoning is easier. > It exists on unix and win32, someone (ahem) is doing a win16 port, and > there's a mac port (with some software rot setting in). It's also a lot > more stable and reliable than 1.85. IronDoc should run as-is on all platforms, modulo file subclassing. And the file format will be portable across platforms. The portability of format is something I'm concerned about. For reliability, my goal is to handle any number or pattern of random byte clobbers in the db format (assuming maliscious file damange occurs) without crashing. > There was some concern here about their licencing terms. I asked > Sleepycat to say something about this here, so hopefully we'll get > an answer soon. IronDoc is public domain. Now all you have to worry about is whether I'm a flake who never finishes the database. (Or maybe Real Programmers don't write databases from scratch, they just upgrade existing ones. :-) David McCusker, without time to respond to the many db thread posts Values have meaning only against the context of a set of relationships.

database

From: David McCusker <davidmc@netscape.com> Subject: Re: Sleepycat DB 2.0 Date: 27 Mar 1998 00:00:00 GMT Message-ID: <351C6981.26482A4C@netscape.com> Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) Mime-Version: 1.0 Reply-To: davidmc@netscape.com Content-Type: text/plain; charset=us-ascii Organization: Netscape Newsgroups: netscape.public.mozilla.general nospam@non.uce wrote: > It appears that some people are confused by the term database or DB > when it comes to discussion of providing a replacement to -????-. No. :-) Database is a generic term which can be used for software that does not conform to the top feature list of ACID properties used by upscale vendors to compete on the basis of high end features. We are talking about "low end" databases, but they are still databases. Being more specific ("persistence engine") is nice but unnecessary. > The API -????- provides is very different from a SQL or ODBC > enviroment that some people associate with being a "database." The contexts folks associate with the term "database" are subject to change, just like all terms. Words are given meaning by the corpus of stories told which use the words in specific roles to designate various ideas. When the stories change, so do the meanings. One can claim some particular story is more appropriate for some context. > To avoid further confusion I suggest that we refer to -????-, > IronDoc, Berkeley DB, etc. as "Structured Storage APIs." I suggest we refer to them as "low end" databases, or databases for short, or db's for shorter. (This is the point at which an anonymous pseudonym becomes awkward, since one reveals little about personal agendas related to high end databases.) Or maybe storage engine, with engine for short, but folks will stay more on track with "db". > David McCusker wrote: [ snip ] > > It's not hard to add multi-process safety. (I hope someone attacks > > me on this. I enjoy dialogues so much more than monologues. That's > > why I chronically strike weaker stances that I might otherwise. :-) > > Ok. As long as your looking for a neck to chop, I'll take the > offensive here... Thank you. Of course I'm not looking to hurt anyone, and it's not possible to hurt my feelings by pointing out things I've missed. It's just that attacks provide so much free energy to be applied. :-) And often folks give really interesting ideas about application contexts. > It can hard to write multi-process safe structured storage API's > depending on your constrants. Yes. It's easy if the unspoken qualifier in front of multi-process is "some" or "adequate", but harder if the qualifier is taken to mean "best possible" or "safe under the worst possible scaling conditions". > If the multi-processes are running on the same machine then it makes > things significantly easier (you have a master process controlling > the file and the rest of the processes access the structured storage > files indirectly through IPC). The hard part in shared access is mediating shared writes, since simultaneous read access is a piece of cake. One approach is to use a "single writer" which serves up write access to either local or remote clients. But this doesn't help resolve conflicts, which can be as open-ended in nature as the number of concept systems that are represented in the storage and runtime system accessing it. (There's no one correct approach for resolving write conflicts, since one can construct difficult idiosyncratic semantic requirements.) Obviously if only one writer is on the same machine, and others are elsewhere, this is automatically a single writer setup. The question is whether this writer bottlenecks all access for other local processes, or whether they coordinate and allow each to write. > However, "multi-process safe" could also be extended to mean multi- > workstation safe where the processes do not exist on the same > machine. Also, it may be possible that the workstations can not > talk directly with each other. But I wouldn't use that meaning for the initial implementation of low end database shared access. There is always going to be a context where the most low end solution is not going to make sense. But that doesn't mean low end solutions are worthless, especially when they are free and suit the usage profile you are planning. Basically, in distributed object contexts I'd rather use a solution designed for that context, such as maybe Linda-style tuple spaces. But that doesn't mean I want to always use tuple spaces everywhere. > Take the following situation: You have server X which provides the > personal directories via SMB or NFS or whatever. Both workstations > A and B are using the same personal profile directory (and hence same > structured storage files) for UseNet reading. In this case I'd first try a single-writer server. > Also, while both workstation A and B have access to server X, the > firewall policies do not allow them to talk directly to each other. But if they both have access to server X, then they can use MVC to act as views on the model at server X. They don't need to talk to each other any more than all clients of a model need to talk. > Another issue is that the disk quota would be exceeded if workstation > A and B where to attempt to keep transaction logs of all their > modifications to the structured storage files. They don't need to if server X shadows blocks local to server X. If server X runs out of space, it aborts changes. If A or B overruns mod space on server X, then server X aborts changes. Why is this hard? (Note my draconian solutions might not fit your performance profile.) You might be interested in the problem of keeping A and B well informed of changes in status with regard to server X. This might lead to a slippery slope of wishing that a perfectly consistent view of the world was possible at all times with distributed objects. That's expensive. > As is usually expected by an end user, the files should be in a > useable state regardless of if a crash of workstation A or B or even > of server X where to occur. Ignoring the difficulties of dealing with liveness, if both A and B do not sign off on a commit, then server X should abort the xaction. If any of A, B, or server X crash, then the store is still fine. > Also, as usually expected, all race conditions need to be taken into > account and dead locks are unacceptable. So, do you have a solution? Operating Systems 101 pretty much presents all the options for dealing with deadlocks optimistically, pessimistically, or not at all. I see no reason to expect a one-size-fits-all solution is a good goal. The last time I did seat-of-the-pants design for flexible shared write access, I suggested that the writing server allow clients to connect according to one of many possible sharing contracts, which clients could choose among (especially to cooperate with each other). This design was hated by folks who believe in "the one perfect way". > Great... how well does it scale? That's the perfect question. In keeping with a low end database, I would start first with a low end scaling profile, and then crawl/walk/ run until I get what is satisfactory for my particular app context. > Can I throw workstation C, D, E, and F into the picture also using > the same personal profile directory? Sure, why not? Why should server X care how many clients connect? There's performance degradation, but nothing is free unless there was collosal wastage somewhere that one stops indulging in. > If your able to provide an elegantly "trivial" solution to this then > I might have a long list of powerful companies that should recieve a > seminar from you on doing structured storage programming! I prefer to write code to giving seminars. I gave free seminars on structured storage to powerful companies when I was last interviewing. David McCusker, staying late for fun discussion (hotdog! brainstorming) Values have meaning only against the context of a set of relationships.

large

From: David McCusker <davidmc@netscape.com> Subject: Re: Sleepycat DB 2.0 Date: 30 Mar 1998 00:00:00 GMT Message-ID: <352006AF.CBB1605C@netscape.com> Content-Transfer-Encoding: 7bit References: <351B88D7.72A9@ibm.net> <351BFEBA.799022DA@netscape.com> <351C2977.D7F676AE@netscape.com> <351C620D.F3354253@non.uce> <351F2EBC.9C8A1B32@texas.net> X-Priority: 3 (Normal) Mime-Version: 1.0 Reply-To: davidmc@netscape.com To: Scott Boland <sboland@texas.net> Content-Type: text/plain; charset=us-ascii Organization: Netscape Newsgroups: netscape.public.mozilla.general Scott Boland wrote: [ responding to nospam@non.uce ] [ snip ] > Or we could borrow terminology from one of the 'low end' databases. > I finally got used to referring to single table/update databases as > 'navigational'. [ big snip ] Incidentally, IronDoc is not single table. It supports arbitrary forests with either arbitrary nesting or no nesting working equally well. (This would have helped OpenDoc quite a bit, since embedded drafts would then have had equal performance to top level documents.) > I would think that the Netscape database needs to support such a monster > to allow the sharing of address directories or bookmarks across an > enterprise, using the same mailbox simultaneously from multiple > workstations, or similar things. [ snip ] I neglected to mention that LDAP is used for large scale addressing needs and that Netscape considers local personal address book databases a kind of private caching or offline mechanism. So the address book role is mainly small scale in contrast with LDAP directories. (Even so, we want address books with 100K+ entries to have very snappy performance, since folks want to take large corporate directories local for offline use. Big address books should open in the same time taken by small address books, and typedown searches should be near instantaneous.) So the issue of really intense data sharing is handled by LDAP use, and address books need only have sharing support adequate for smaller scale requirements. (We must have have *some* sharing support to not be lame.) David McCusker, playing structured storage doctor with client AB db's Values have meaning only against the context of a set of relationships.

concerns

From: David McCusker <davidmc@netscape.com> Subject: Re: DB array access for UIs? Date: 31 Mar 1998 00:00:00 GMT Message-ID: <352141AB.59811B94@netscape.com> Content-Transfer-Encoding: 7bit References: <199803310423.XAA09898@mongoose.bostic.com> X-Priority: 3 (Normal) Mime-Version: 1.0 Reply-To: davidmc@netscape.com To: Sleepycat Software <db@sleepycat.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Newsgroups: netscape.public.mozilla.general Sleepycat Software wrote: > > From: David McCusker <davidmc@netscape.com> > > I suspect Berklely DB is near the top of the list, mainly from > > familiarity to people (without regard for any technical problems > > its use might cause). I'm not sure, but it might be the school of > > "ignore those sharp edges, just keep pushing and it will go through, > > and then we'll hammer to fit". While this approach gives fast 90% > > fit, bugs and perf tweaking go on *forever*. (This is how most > > software is written and explains prevalent quality.) > > I'm a little concerned that you're clearly seeing problems with > Berkeley DB of which I'm not aware. No, I'm not seeing problems with Berkeley DB since I'm not using it. But it's not necessary to have direct experience with something to reason about facts, plans, and goals. (And folks often approve of reasoning about the results of a course of action in advance. :-) > Can you further describe what "sharp edges" > you've experienced that needed to be hammered into place? No, I can't since I don't have any experiences. But I can show you an excerpt of the concerns I mailed to some other XP mail/news engineers last January in response to the call for concerns to be directed toward Sleepycat Software's desire to do good deeds. I don't know if you got it. Sorry if the list is abrupt or unclear -- it was something I whipped up off the top of my head. I'll take the time to come up with a more complete list of concerns later (assuming that this initial list seems to have any useful cooperative effect). As long as you're taking requests, please don't hammer things into place since it is easier to figure out how things were designed to work (so they can be changed with full understanding) when they are not mashed into incomprehensiblly flat, hammered-out shapes. :-) On 30 Jan 1998, David McCusker wrote: [ snip ] | So if Sleepycat does not have a specific Mac product, then their | code will only be usable if they assume nothing about files, memory, | threads, locks, or anything else outside the DB source code. | | Assuming their code works on the Mac, the file system only really | causes a concern for how many files are needed for a database, since | the aging Mac HFS file system has minimum sizes for each file which | is greater whenever the volume has more space. In these days when | disks are large, minimum file size can be 65K for a 4GB disk. (If | the file has a resource fork of any size, this is doubled to 129K.) | | In addition, I also have the following concerns that only relate to | whether the database is a good match for my desires. My tendency | is to assume these desires would be frustrated by Berkeley DB. | | - The Sleepycat file format should be cross platform, so a file | moved to another platform can be used without changes. This | mainly means they should handle endianess and byte swapping. | | - A database should be able to hold more than one index. | | - A database should also have a variable length blob system inside | the same file as the indexes. | | - The number of indexes in a file should not be fixed, and both | indexes and blobs should have type metainformation so their | formats can be inspected at runtime. | | - Content in a database should be looked up by name, so that many | indexed can live in one database with different names (because | this is the most flexible way to support many indexes). | | - A database should have a memory footprint policy such that apps | can completely control the memory usage of the database. | | - The database should be robust so that all changes made to a | database are atomically committed. This condition is hardest | to satisfy when multiple files compose a database, because | atomicity must apply to relationships spread across all the | files. (This is complex and hard to do, and some vendors seem | sorely tempted to obfusticate this issue to avoid analysis.) | | - The database license should allow third parties to access our | content with as near zero hassle and cost as can be managed. | | - The database license should not require that we ask for owner's | help in making changes, or require that we provide any changes | back to the license owner. | | - The source code of the database should not be so complex that | it cannot be understood when read, or cannot be modified when | changes or bug fixes are necessary. | | I'm sure I'll think of other things, but this is it right now. [ snip ] And you've already seen the major concern about array style access that started this thread. I'll can also try to remember some other things I was thinking about as requirements or general worries. David McCusker, cross platform Communicator mail/news client engineering Values have meaning only against the context of a set of relationships.

shrinking

From: David McCusker <davidmc@netscape.com> Subject: Re: smn files grow without bound & other questions Date: 08 Sep 1998 00:00:00 GMT Message-ID: <35F592DF.FF9502EF@netscape.com> Content-Transfer-Encoding: 7bit To: Raymond Steven Kutai <rkutai@utk.edu> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Raymond Steven Kutai wrote: > Deleting .snm files every so often is annoying. .snm files hogging > disk space is annoying. I think it is about time to switch to > a new database system that keeps .snm file relatively small. Keeping a database small is a hard problem, partly because this does not usually take precedence over some other priorities in database design and application that focus on speed and efficient space reuse. Even a new database would tend to encounter the same problems. Though it's not too hard to get a database to stop growing after it reaches a highwater mark that corresponds to the most content that was ever put into it, it's a bit hard to get it to retreat from such a highwater mark without rewriting the database to leave out holes. Rewriting a database to make it smaller is a form of compression, and there's lots of ways to do it from garbage collection in-place to just making a new file that copies all the reachable current content. One might find a "compress" menu item that does this for some databases. Because the task of shrinking a database tends to be a monolithic act that might be time consuming, one hesitates to do it frequently as an automatic act without user intervention since this would consume the user's time without their permission. David McCusker, 4.5 mail/news client address book backend and db Values have meaning only against the context of a set of relationships.

smartmail

From: David McCusker <davidmc@netscape.com> Subject: Re: Smartmail or Berkely folders? Date: 02 Nov 1998 00:00:00 GMT Message-ID: <363E69CC.1BBFC466@netscape.com> Content-Transfer-Encoding: 7bit References: <363E400C.C7F2E7B3@woudt.nl> To: Edwin Woudt <edwin@woudt.nl> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Edwin Woudt wrote: > I read that recently some mailbox store source code was found. I don't know what this means, so I must guess. Either you mean a thing I've never heard of, or you might mean Brendan Eich's comments about the mozilla folks' balloon to prefer Berkeley DB as a storage system engine. But neither of these things can have a large impact. We want to make the storage mechanisms used in mail/news pluggable, with some well-defined requirements for interfaces and semantics. But we also want to permit continued use of our current storage mechanisms by using the same plugin interface. This would let mozilla folks use something else, including Berkeley DB if desired. And it would allow folks like myself to use yet another storage mechanism if we wish. (I don't have any intention of using Berkeley DB, but that won't stop folks from doing whatever they please for their own purposes. Still I won't do anything aimed more at Berkeley DB than at anything else.) As near as I can tell at the moment, I'm going to make up an interface that will work by modifying some existing abstract interface work that was done by David Bienvenu during earlier efforts. The hard part in this will be that I don't understand much of the semantics in the code that I'll try to alter. I think the code is the spec in this case. > This way it is possible to bypass the copyright restrictions making > it impossible to 'open source' the current mail/news client. In order to open source the mail/news client, any code released as source can only contain materials we have the right to disclose. We can package any problematic aspects of the current implementation to go behind an abstract interface, where the interface that specifies the needs of mail/news can go into open source. This is possible. Then in order to have an open source implementation that actually worked, one would need to either link against a closed binary that provides the missing portion in our current code base, or else one would need to implement the interface using some open technology. Of course, mozilla developers could prefer and choose the latter option. > Will this mean that we will see a 'communicator 4.5'-like mail/news > client in version 5.0? That is a definite option, and that fits with what I am currently doing, but that does not guarantee a definite future plan of action. > Or will smartmail (which I still do not completely understand, so > please can somebody give me a very thorough explanation) be the > client for version 5.0? I don't understand the status of smartmail at the moment. I have not heard anything about it for a while. I'm not sure what to read into the fact that I don't hear about it currently. I'd hate to mislead you out of ignorance, if only because I don't care to get smacked. :-) > Or do I have everything wrong and are those two the same? I think this has been a question at times. Right now they differ. David McCusker, speaking for myself, but knowing mail/news tidbits Values have meaning only against the context of a set of relationships.

xml

From: David McCusker <davidmc@netscape.com> Subject: Re: XML db priorities Date: 07 Dec 1998 00:00:00 GMT Message-ID: <366C8229.782B30EF@netscape.com> Content-Transfer-Encoding: 7bit References: <366C593D.62A0F6A5@netscape.com> <366C60E0.D6EBB32E@netscape.com> To: Chris Waterson <waterson@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Chris, I feel funny writing this response to you since we were just in a meeting discussing these topics. But it's a good idea to cover more details and make this all as clear as possible. Chris Waterson wrote: > But "parsers", by definition, don't write: so I'm not sure what > the point is here. You are going to have to invent the writing > part yourself, no matter what. Yes, that's exactly my point. My main db problem is writing and not reading. When I choose an XML format for storing persistent content, the issues I have to solve are when I will write content, how much I will write, and how it be both fast and safe. Using someone else's parser solves none of these problems, while giving me the extra work of using someone else's code to read my format. The reason why I make this point explicitly, is because folks are likely to get the idea that using an XML parser solves a problem for me, when it actually buys me nothing in particular. I might approach these problems backwards compared to some folks, since I ask first how I am going to write something, before reading it. +----------------+ | FE | +------------+ | | RDF | | | +--------+---+ +---| MSG | | +------------+ | XP DB | +------+-+-------+ +-------+ | yadb | | XMLdb | <--import-- | 4.5db | +------+ +-------+ +-------+ This architecture diagram gives the context for the problem to be solved. One of the unstated constraints is "code it quickly". I need to put something in the box labeled "XMLdb", but whatever this is will be completely transparent to mail/news because it will be totally hidden under the XP DB interface. That means it would be almost as good if I use some homebrew binary format which is of no use to anyone else for any other purpose, as long as I finish fast. The idea of using XML for the file format was to play nice with other folks, because it would be pleasant and constructive to use an open format other folks can read and write, and it would also make testing simpler when using standard generating tools. But the XML format is not required, and we would be forced to drop it if the engineering time gets too high for some reason, like thrashing. Because the XMLdb is behind the abstract db interface, it does not matter what it is, except it's nice to use good architectural style even behind the scenes, provided one has the luxury of doing so. I have pointed out in other contexts that we can plug in any other db that fits the abstract interface, including one based on RDF. > The de facto "generated data structure" with pure XML is a content > model. Because pure XML can be used to encode anything, since all content in computing is composed of graphs, this does not distinguish the kind of application data model used. Folks associate XML usage with DOMs and formal content models. But there's no reason to associate XML with a specific computational style, except that the human- readable format is especially suited to end-user environments. Note that in this context, the layer labeled "MSG" is the one that has a content model, and the db layer is used as a persistence mechanism. This roughly means that any general purpose content model under the abstract db layer could be slower than something less general. Usually generality has space and/or time costs. > I'm sure that you can decorate this in any way that is convenient > to accelerate the collection and flushing of dirty content. Yes, but since using a specific content model under the abstract db interface is not required, we add a new hoop with this constraint. > A major problem with XML is that the serialization syntax and the > content model are tightly coupled. Yes, that's another way of saying XML can encode anything, and apps can have arbitrary content and semantics. I agree with that. > The implication is that a change to a deeply nested node > "in the middle" of the content model requires you to re-write > "the middle" of the document. Yes, and when the document is a single byte stream, writing a new number of bytes as a replacement might require moving all the bytes that are downstream, and this is a main cost in doing such a thing in files with Unix style semantics. If the same number of bytes are written, then one can blithely update in place. Most binary formats intended to be efficient arrange to permit update in place as much as possible to reduce total i/o cost. This is why some binary databases, like my public domain IronDoc, use a storage representation for streams (or files, or blobs, if one prefers those terms) that permits insert or deletion of bytes in the middle while constraining cost to space close to the change. IronDoc has blobs modeled after files in the Exodus storage system. This permits efficient update in place, in the middle of a stream, even if the stream and the changes are variable length. (I mentioned in the meeting that public domain IronDoc was designed to be the most efficient way in which to both read and write graphs of heterogenous content in a schema-less format. So I tend to make performance judgments with regard to strategies like using XML in terms of the manner in which they fall short compared to IronDoc.) > For example, say you want to change the value of "bar" from 10 to 20: > Before: <foo> <bar>10</bar> </foo> > To do this, you'd need to re-serialize all of the XML around "bar": > After: <foo> <bar>20</bar> </foo> One needs to record the change somehow, where the change will shadow the original value and take precedence the next time someone wants to read the value, and one might lazily make the change persistent. In practice it is easier math to choose some subset of the file that contains the change and update that portion, either logically in place or logically elsewhere, in a way understood to take precedence. > Were you to use RDF, your generated data structure would be a graph > (which you can also decorate however you want to make writes fast). > But because the RDF "content model" (graph) is not tightly coupled > to the serialization syntax of the document, it is much more > amenable to incremental modification: > > Before: > <RDF:Description about="foo" bar="10"/> > > After: > <RDF:Description about="foo" bar="10"/> > <RDF:Description about="foo"> > <bar tv="false">10</bar> > <bar>20</bar> > </RDF:Description> > > In other words, your "log" is built in. That's cool. We can use RDF as a db under the abstract db interface. This is one of the feasible choices I have mentioned before. But it seems a shame to hide all that generality below the If writing RDF already had the high performance I was targeting, then I would say that was the best immediate choice. But since you wanted to use the efficient text updating mechanism I was going to write, it seems our requirements have a circular relationship right now. That seems to make it even more clear that I should show an abstract db interface soon, so folks can see that particular hoop we plan to jump through. Of course, strictly speaking this is a choice that could be undone, since one could move a DOM up into the MSG layer, and this DOM could do it's own persistence. But that makes no shorter schedule, and doesn't let other folks plug in every storage mechanism of choice. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

mbox

From: David McCusker <davidmc@netscape.com> Subject: Re: mail/news db 5.0 offline design Date: 15 Dec 1998 00:00:00 GMT Message-ID: <36770698.90A42F20@netscape.com> Content-Transfer-Encoding: 7bit To: John Gardiner Myers <jgmyers@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news John Gardiner Myers wrote: [ snip ] > > Is it necessary to handle this in a way different than how POP does? > > An argument of "we're handling POP poorly, so it's OK to handle IMAP > equally poorly" isn't very compelling. I agree such an argument is not compelling; I'm just fact gathering. (Maybe my casual play-it-by-ear style makes it seem I have committed sooner to courses of action than I actually have. I prefer provoking folks into telling me I'm wrong, to wasting time on safe positions.) > > One of the often unspoken canonical design patterns in database > > systems is that content may require tranformation to and from > > when pickled in a persistent store. > > The problem with "From " lines is that the transformation is not > reversible. I was thinking we could do the mangling smarter with a new header that signals that mangling is done a special way in a message, so that the mangling could be reversible. So a x-moz-fancy-from-mangle header could state that "xyzzy-From " really means "From ". Just an idea. Basically headers are an escape hatch to declare more escaping rules. This is meant to suggest ideas and not to propose a specific plan. > There's also the line separator transformation issue. Messages going > into a Berkeley mail folder on platforms other than Windows have to > convert from CRLF to the local line separator going in, and convert > back going out. That sounds like it implies that CRLF is the standard on the wire for line separation; that seems surprising, but I admit ignorance there. However, I get the idea that one must tranform to the local line separator when the format uses lines significantly in the grammar. > It is possible to implement this correctly, but there's a > performance impact. If there's a disk or network access involved in the total transaction that includes line transformation, the time cost must be small when a disk access is millions of times slower than a processor clock cycle. But I know memory to memory transformations take a performance hit. > > My original intent was only to specify design for db behavior in the > > offline scenario, without specifying other aspects of offline usage. > > The problem is that this approach designs the product into a corner. Not if I establish whether there are problems, like we are doing now. The approach was to establish whether the db could be designed in a context-free fashion. When this is not true, then one either gives up the approach or paints oneself into a corner as you suggest. > If you accept damaging "From " lines, only use the database for offline > use, and accept a policy of downloading all attachments, then it is > possible for the database to use the mbox format. Then, use of the mbox > format becomes a design requirement and you are from then on prevented > from preserving "From " lines, using the database for caching online > use, or implementing partial downloading of messages for offline use. Okay, now I know we have problems with mbox format, and since I am very partial to safe, fast, and lossless tranformations, I'd like to consider alternatives. But now I'm afraid we'll get sucked into that special purgatory reserved for folks who try to establish format standards for significant content very close to the hearts of users, especially if we are thinking of altering the persistent format for client POP mail. I don't have a vested interest in the particular text format we use. I had assumed we could not choose to drop the Berkeley mbox format, and I already predicted earlier we would end up recanting such a decision if we tried to change the local POP format. Let's take one that slowly. We can store offline IMAP and news in some other format with rigorous behavior and less transformation overhead. We can probably use the same abstract db API for message databases as for summary files, which allows them to be the same format or different formats. Here's a notion for brainstorming purposes only. Suppose messages and summary files are the same format, to be based on some SSF++ format we discuss ad infinitum (or until next week, whichever is sooner). Then summary files might only differ by having rows with 1 or more attributes pointing into the message, while omitting some larger attributes. Since each table has a scope for its rows, tables could individually have an attribute declaring the base table medium is located in another file. This would also let one summary file index many other files. You can see how to generalize this with wild abandon to get different effects. Yes, I know how this resembles the purpose of RDF for similar reasons. But folks are shying away from RDF serialization in XML for reasons of verbosity that might have some performance implications. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

roles

From: David McCusker <davidmc@netscape.com> Subject: roles and db help (Re: Help: Confused by new Mail/News info) Date: 20 Jan 1999 00:00:00 GMT Message-ID: <36A639BB.72EBD222@netscape.com> Content-Transfer-Encoding: 7bit References: <36A373F6.F9BA2540@darmstadt.gmd.de> <36A626E2.B1608EB9@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Phil's comments on Grendel all sound right to me. It seems like a good idea to have some folks working on Grendel, especially if this keeps all the best ideas in circulation. But the current in-house mail/news team is not engaged at all with Grendel, so there's not much leverage there. Phil Peterson wrote: [ snip ] > Oh, BTW, I'm on the team working on mail/news inside the building. > I'm a part time engineer and part time manager, and my little group > is responsible for the "back-end" part of Messenger. I like clarifying our roles to encourage folks outside the building to get involved and know who's doing what. I'm a full time engineer doing the database work on Mork and MDB, from design to coding and testing. I could use some help on some parts to pull in the schedule a bit. I'll post separately the critical path stuff other folks might develop. Right here I'll mention two things not on the critical path, that could still benefit from someone else's effort: 1) a standard table sorting object, and 2) a standard ldif import object. The first cut MorkDb implementation does not need to support sorting, but when it does, it should be through using a standard in-memory sort object that is bound to a specific table and sorts the rows for that table by a specific column. Typically the same table being sorted will privately use this sort object to implement the table's sort methods. If someone makes up an API for this, and outlines an implementation without my input, then this might be faster than if I spend some hours doing that myself without being primed by someone else's good ideas. The API should use the mdbCompare interface for comparing columns. The first cut MorkDb also need not support import, but when it does, it should use a standard object to parse LDIF records from ldif text files. At the moment, I think the API for such an object should generate a sequence of row instances to be added to some other db table. It would be useful to have someone think about sharp edges in this context, but less design work seems necessary. An implementation needs little design since it can be cribbed almost directly from the 4.5 address book code. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

interface

From: David McCusker <davidmc@netscape.com> Subject: [MDB] new abstract file interface to replace file paths Date: 25 Mar 1999 00:00:00 GMT Message-ID: <36FAA93C.5AED915F@netscape.com> Content-Transfer-Encoding: 7bit To: David Bienvenu <bienvenu@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news David Bienvenu suggests I add an abstract file class to the MDB API, so that files can be passed as objects to factory and store objects, and not just a file path. The nsIMdbFile class below is my first shot at a design, so folks can give me their feedback. I'll explain relevant issues so folks can keep them in mind when expressing their opinions. The main reason for nsIMdbFile is to support the Mac since a file path does not uniquely identify a file under every circumstance, so the old interface using file paths to name files was potentially ambiguous. (The Mac lets different volumes have the same names, so paths to files on those volumes can be the same path.) So we want file specs instead. David Bienvenu also wanted a file object to avoid the need to open and close a file to provide the first 512 bytes passed to MDB factories to ask whether a file can be opened by a specific MDB suite. We pass the first 512 bytes to help prevent the typical need for each factory to open and close the file in order to answer this question; the proposed change only saves the first extra file open and close. (We avoid extra file opens because they're often very time expensive in file systems.) But there's a problem with specifying an abstract MDB file interface that must be used by any MDB database implementation. The file API might not be enough to satisfy a specific DB; some DB's have rather platform specific file support that is not necessarily similar on each platform, so that a file abstraction will not fit very well when it does not exactly match what a DB expected on a given platform. So I can specify an abstract file API that satisfies Mork (and public domain IronDoc for that matter), but I know it will not satisfy the needs of every DB out there. So I need to add a way to let a DB wrest control of the file away from a nsIMdbFile instance, so the file can be reopened by a DB in the manner in which they are accustomed. You might find the proposed interface to wrest away such control a little strange. But I'm much less concerned about whether it seems strange than I'm concerned about what is missing in order for some DB's to wrest away control successfully. Tell me what's missing. This nsIMdbFile will tend to replace the morkFile class currently in use within Mork, and the morkStream class will buffer i/o to and from a nsIMdbFile just as easily. There will be no performance impact from layers of abstract API dispatching for nsIMdbFile, since dispatches are amortized over many individual character i/o accesses when buffering. The only methods really needed are Tell, Seek, Eof, Path, Read, Write, and Flush. These are all the morkFile virtual methods, and this is sufficient for the Mork text-based file i/o. The Get() and Put() methods are provided in case a nsIMdbFile can do this more cheaply than just by calling the expected Seek()/Read() or Seek()/Write(). Some DB's like IronDoc will always specify file offset for each i/o, so the Get() and Put() methods might work better, but the default for each of these is expected to be calls to Seek, Read, and Write. I am aware of at least one database implementation that will not be satisfied by the nsIMdbFile interface, but I think they can be happy through stealing control of the file away from nsIMdbFile. But even when control is stolen, I want the caller who created and passed the original nsIMdbFile to have some idea that such a thing happened, and that's why the thief is remembered by the nsIMdbFile interface. When control of the file is stolen, a nsIMdbFile subclass should then pass through future file API calls to the thief as pure delegation. (Note this means the thief cannot then use the original nsIMdbFile as part of it's own implementation, since the result would always be recursive callbacks to the thief withoug accomplishing any work.) I will also add a nsIMdbFile-creating factory method similar to the ones for providing default environments and heaps, so that clients need not roll their own nsIMdbFile class if they are will to use the one that comes with an MDB suite. But it seems like it would need to use file paths. (Adding a file spec dependency would need to be done in such a way that I could still build standalone without them.) /*| nsIMdbFile: abstract file interface resembling the original morkFile **| abstract interface (which was in turn modeled on the file interface **| from public domain IronDoc). The design of this file interface is **| complicated by the fact that some DB's will not find this interface **| adequate for all runtime requirements (even though this file API is **| enough to implement text-based DB's like Mork). For this reason, **| more methods have been added to let a DB library force the file to **| become closed so the DB can reopen the file in some other manner. **| Folks are encouraged to suggest ways to tune this interface to suit **| DB's that cannot manage to pull their maneuvers even given this API. **| **|| Tell: get the current i/o position in file **| **|| Seek: change the current i/o position in file **| **|| Eof: return file's total length in bytes **| **|| Read: input inSize bytes into outBuf, returning actual transfer size **| **|| Get: read starting at specific file offset (e.g. Seek(); Read();) **| **|| Write: output inSize bytes from inBuf, returning actual transfer size **| **|| Put: write starting at specific file offset (e.g. Seek(); Write();) **| **|| Flush: if written bytes are buffered, push them to final destination **| **|| Path: get file path in some string representation. This is intended **| either to support the display of file name in a user presentation, or **| to support the closing and reopening of the file when the DB needs more **| exotic file access than is presented by the nsIMdbFile interface. **| **|| Steal: tell this file to close any associated i/o stream in the file **| system, because the file ioThief intends to reopen the file in order **| to provide the MDB implementation with more exotic file access than is **| offered by the nsIMdbFile alone. Presumably the thief knows enough **| from Path() in order to know which file to reopen. If Steal() is **| successful, this file should probably delegate all future calls to **| the nsIMdbFile interface down to the thief files, so that even after **| the file has been stolen, it can still be read, written, or forcibly **| closed (by a call to CloseMdbObject()). **| **|| Thief: acquire and return thief passed to an earlier call to Steal(). |*/ class nsIMdbFile : public nsIMdbObject { // minimal file interface public: // { ===== begin nsIMdbFile methods ===== // { ----- begin pos methods ----- virtual mdb_err Tell(nsIMdbEnv* ev, mdb_pos* outPos) const = 0; virtual mdb_err Seek(nsIMdbEnv* ev, mdb_pos inPos) = 0; virtual mdb_err Eof(nsIMdbEnv* ev, mdb_pos* outPos) const = 0; // } ----- end pos methods ----- // { ----- begin read methods ----- virtual mdb_err Read(nsIMdbEnv* ev, void* outBuf, mdb_size inSize, mdb_size* outActualSize) = 0; virtual mdb_err Get(nsIMdbEnv* ev, void* outBuf, mdb_size inSize, mdb_pos inPos, mdb_size* outActualSize) = 0; // } ----- end read methods ----- // { ----- begin write methods ----- virtual mdb_err Write(nsIMdbEnv* ev, const void* inBuf, mdb_size inSize, mdb_size* outActualSize) = 0; virtual mdb_err Put(nsIMdbEnv* ev, const void* inBuf, mdb_size inSize, mdb_pos inPos, mdb_size* outActualSize) = 0; virtual mdb_err Flush(nsIMdbEnv* ev) = 0; // } ----- end attribute methods ----- // { ----- begin path methods ----- virtual mdb_err Path(nsIMdbEnv* ev, mdbYarn* outFilePath) = 0; // } ----- end path methods ----- // { ----- begin replacement methods ----- virtual mdb_err Steal(nsIMdbEnv* ev, nsIMdbFile* ioThief) = 0; virtual mdb_err Thief(nsIMdbEnv* ev, nsIMdbFile** acqThief) = 0; // } ----- end replacement methods ----- // } ===== end nsIMdbFile methods ===== }; David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

entropy

From: David McCusker <davidmc@netscape.com> Subject: Re: Address book crashes W98 Date: 26 Mar 1999 00:00:00 GMT Message-ID: <36FBF42F.C417FAE4@netscape.com> Content-Transfer-Encoding: 7bit References: <36FBC6D0.BA71A484@billpetro.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Since this is a development group, I'll use your question as an excuse to lecture developers on data loss issues I want them to grasp better. Then I'll actually give you a useful answer at the very end; you can jump to that part directly if you don't want the theoretical stuff. Bill Petro wrote: > I have a moderately sized Address book, and I've found that recently > that when I open it and scroll through it, it not only freezes, but > also freezes Win98. Is there a way to "repair" an Address book? This is related to my earlier statement in this group that all DB's get corrupt eventually, since entropy always increases. (A robust DB seems never to become corrupt because it's mean time to failure exceeds that of the file system and/or data backup procedures used by DB users.) A DB is safest when it runs in a context (process or machine) where code never crashes or steps on memory. Failures in non-DB code will increase risk of corruption in DB loci of control through no fault of a DB. A DB can use fairly effective transaction technology to avoid loss, but there are still often some windows of risk when a host machine crashes. The most insidious form of corruption occurs in systems that support plugins, because this causes a data risk that plugin code will clobber a small enough bit of memory in buffered DB content that data loss is caused without being severe enough to crash the process. So the bits go to disk even though they were previously vetted just fine. A crash would be preferrable, since it would kick in any transaction control. Since Netscape can run plugins using open API, it's not possible to run a DB in perfect safety, no matter how good the DB software is, as long as plugin code can step on a buffer that's outbound for persistence. The mean time to failure depends on the quality of all the code running during the time that buffered DB content is in RAM. So it's not surprising when corrupt address books eventually occur. But it would be nice if a corrupt address book would not crash process and machine when bad bits are seen in DB content. When I write my own DB code, it is a big concern of mine to not crash when seeing bad content pulled from disk. I can't speak for 4.5's DB since I didn't write it. There's only one way I know to attempt recovery of a bad address book. You can try exporting it to ldif, since this traverses a minimal part of internal DB data structures, and you could get lucky and miss the portion that's giving you trouble. But it's in bad shape if it can't be exported. Then I can only recommend dredging through a hex dump to glean any content you want to save that can't otherwise be replaced. Hmm, but as an experiment, you might try importing the address book in an MS mail client, because I think they reverse engineered a formulaic way to dredge parts of the file format for content, without actually using any DB index structures in the file. David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

address

From: David McCusker <davidmc@netscape.com> Subject: Re: Address book specs Date: 11 May 1999 00:00:00 GMT Message-ID: <3738B592.2812B50F@netscape.com> Content-Transfer-Encoding: 7bit References: <37389111.66546AD6@heimdallr.u-net.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news Al Vining wrote: [ snip ] > I know there's nothing you can do, because you're constrained by > databases and the need to export to other formats, etc., but it would > be really good if I got to decide, rather than being told, how my > addresses were organized. Mozilla 5.0 is not constrained by the database at the bottom, but there are a few layers in between the db and the user interface. MDB and Mork can encode just about anything since they're schema-less, even metainfo about attributes when useful (see an example below). But although the db can encode something, the layers above still need code to mediate the presentation to users and to apply interpretations in meaningful ways at runtime, and that's fairly constraining. Neither is export to other formats constraining, since we can always have db information that cannot be exported easily. That only means that the export format constrains what it can represent, and not what we can represent. But a round trip export-then-import will lose data if the export format cannot hold it. (Ldif and vcard will lose data.) (Actually we might be able to trick ldif and vcard into holding more information by stuffing out-of-band metainfo into dummy entries which use their content slots for non-standard "wrong" purposes.) > On a simple level, this has to do with localization. I do not have > a zip code. Almost nobody in my address book has a zip code. Localization is another way of saying flexibility, and in development of db's this means fewer schema restrictions, with preferably none. If the UI is made as flexible as the backend db, or perhaps is driven by metainfo in the db, then the UI need not have such a schema either. > In much of Europe, the postal code would come before the town (12345 > Villeneuve). Here in the UK it comes after. So either you have to > go out and work out the address scheme of every country in the world > and reconfigure the UI when I select the country (which I'd have to > do before entering the rest of the address...) or you let me type > the address the way I want to. It would be feasible to define the layout of address UI elements with some kind of text-based template enumerating attributes of interest. (For example, XUL could be bent to this purpose.) These attributes would bind dynamically to associcated db attributes at runtime. If the db has no schema, then you could define attributes in the dynamic UI that never existed before in the db. But putting such flexibility in the UI is serious work since we never did it before in the UI for address books. It's a new non-trivial feature with some significant development resource cost. If such a flexible UI was text-based, this would be a kind of end-user programming used by very few members of the user base, so the feature would be more marginal. Or if it was instead based on some fancy UI editing at runtime to be very user-friendly, it'd be more dev expensive. > Again, I'd like to be able to categorize addresses the way I want to, > not just as 'Work' and 'Home' - fine so long as people a) are at work > b) only work in one place c) only live in one place. Not the case, > again, for many people I know. We could categorize any way you'd like, since the db can hold anything. But there must be some particular method of presenting this in the UI. There are a number of ways you could go about associating a "category" string annotation with either individual attributes, or with entire AB entries as a whole, and a number of ways of presenting such annotations. Such features are usually not implemented because folks can't think of an appealing way to present such semantics clearly to users. Such a feature would be most useful if easily discoverable (as opposed to requiring that users notice that new menu items appear, say, when certain address book elements are selected in the UI). Can you describe any ideas about doing this in a way that appeals to you personally? > You see, you're never going to get the address book even half right. That's too pessimistic. :-) > So you have 'Work', 'Home', 'Fax', 'Pager', 'Cellular' (or 'Mobile', > as I'd probably say, another l10n issue). But most people are going > to leave 3 or 4 of these empty, and curse you for not including the > one they want ('ISDN', 'Switchboard', even 'Iridium'). It doesn't have to be that way. But habit makes folks continue to design the UI like that. To a certain extent, the fact that LDAP likes to define specificly named attributes for such things will tend to make us have counterparts just so we can import ldif in a well-defined way. But we don't need to push that schema in the user's face. I could see letting users make up any column names they like, and letting them annotate the column with what "kind" of attribute is involved. So they could create an "Iridium" column and say it's a kind of "phone" attribute, so it might appear when displaying phones. This is related to, but slightly different than, just letting users rename how an attribute displays. Here's where I finally present an example of encoding attribute metainformation in a Mork db. (I use Mork below for encoding, but all this information is available through the MDB interface at runtime.) Suppose you wanted to put a table into a db, containing one row for every column name you wanted to annotate with associated attributes. For such a table, you must invent at least a couple of namespace terms to distinguish your kind of table from any other kind, just so you don't step on the toes of every other db user. So this is arbitrary, but let's say we put address book column notes in a name space called "ab:col:notes", and describe the table format as "ab:col:meta". Both these choices are arbitrary, but it explains their appearance in the Mork example below. Also, let's use a table ID of 1 for this table, expecting we'll only have one of these. The row IDs for each row in the table are also arbitrary (but unique). (Instead of random, a better choice of row IDs would be the integer token values for column names when tokenized in the MDB interface.) {1:ab:col:notes {(k=ab:col:meta)} [2 (self=iridium)(display=Iridium Phone)(kind=phone)] [3 (self=fax)(display=Facsimile)(kind=phone)]} Note the columns "self", "display", and "kind" are arbitrarily chosen, but must make sense to whatever code is going to use such a table to find annotations about column attributes. The "self" column names the attribute being annotated. The "display" column shows how the attribute should be titled in a UI presentation. And "kind" says both these are considered phones. (If more than one kind applied, perhaps they would be comma-separated.) Depending on what kind of UI we were willing to implement for the Mozilla address book, we could invent appropriate meta-attributes for columns when we wanted to annotate them in a simiar fashion. We have an easy mechanism for encoding; the hard part is specifying what should happen, and then writing the code to do it. But however we provide this kind of UI flexibility, this can create a problem in exporting content to fixed-schema systems like vcard, which has specific ideas about how to encode, say, phone attributes. Giving power to users can expose them to inter-standard conflicts that we normally try to mercifully hide them from under the covers. > If you poke around in Outlook you'll see it offers well over 100 > contact fields. If you could poke around in 4.5's backend code, you'd see it offers slightly over a hundred contact fields. But the majority cannot be accessed through the UI. (They only affect ldif import/export.) Actually the better solution is to support an open-ended and arbitrary set of address attribute columns. (But we could not provide such a feature in 4.5 for reasons I can't describe without incurring a great risk of being struck by legal lightning.) > 'Company yomi' which, if I understand it, lets you enter the > pronunciation of an ideographic japanese company name. Sure, you > could add this to the japanese version of Netscape, but there are > presumably people in the rest of the world who could use such a > feature. And so on. You'll always miss something. So if we make the set of attributes open-ended and easily configured by users, does that satisfy your requirements? Is there more to it? > So what I'd like is a bare minimum of fields: > > Display name, primary email, primary telephone -- these would show > up in the 'Popup selector dialog'. I wouldn't have to call them > 'primary', I could call them what I liked, but I would have to mark > one email, phone, address(?) as the default. What does primary mean when there's no associated secondary? What does default mean -- that they display by default? > After that, pretty much free text. We usually have a notes field for free text. > It would be nice if I could assign categories to fields, so that it > could tell between an address, a phone number, a date and a gender, > but maybe I'd give up this ability for the flexibility. If the UI folks want to design this into the address book interface, then the db has a way of encoding this. But it makes some work for the address book implementation to actually use if meaningfully. > Then you just display it as XML, and let me create my own stylesheet > to control the formatting. That's feasible; in fact, I've recommended to folks that we export to XML as one output format that let's folks bind to any choice of style sheet for printing purposes. But since RDF is being used as the bottleneck, I don't know how easy it is to have RDF present address book content as XML. > It's not like I'm asking for much. OK, it is, but I had to say it > anyway. Seemed like reasonable requests to me. David McCusker, making a little difference (iota inside(tm)) Values have meaning only against the context of a set of relationships.

perf

From: davidmc@netscape.com (David McCusker) Subject: Re: Global History performance Date: 09 Jul 1999 00:00:00 GMT Message-ID: <37867947.BEE1B4C7@netscape.com> Content-Transfer-Encoding: 7bit To: Warren Harris <warren@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.netlib > Chris Waterson wrote: [ snip ] > > The current implementation was inherited from Guha and blows donkey. One > > of the things that I've got on my list of Things To Do is to replace the > > history back-end with the mail/news DB, "mork". I was going to do this > > primarily for feature enhancement (e.g., you can't "delete" items from > > your history with the current implementation). I think I can dig up and post the analysis I wrote and gave Chris on the topic of using Mork for this history purpose. I'll review that earlier private communication and sanitize if necessary before publication, with Chris's permission I hope and expect since it's nothing at all sensitive. > > David Mc is pretty uptight about performance, so I'm sure he'll enjoy > > being the Man On The Seat with respect to answering history queries > > quickly. Okay, I'm a bit of a performance freak. :-) Sure that would be fun. It's also easy if we don't worry about using memory proportional to all the content being indexed. But if we require a random access disk-based db for this purpose in the near term, then I'll sigh a bit and explain more. > > This is disappointing; however, it will be out of the equation with > > the Mork-based implementation. The material I post later should make this clear. But in summary, one can lookup Mork rows by some attribute value, and this will lazily build a hash table that maps all such attributes in rows within some space to the containing rows, so that future queries are O(1) hash table lookups. The abstract MDB interface hides this, and it just happens that Mork uses a hash table since it is memory based. A disk-based db under the MDB interface would presumably use a disk-based btree dictionary for that. > > No matter _what_ we do, doing lookups for thousands of URLs (links on a > > page) amongs potentially 100s of thousands (history) is always going to > > be costly. Is there any possibilty that we can do this lazily; e.g., > > only when a frame is realized to the screen? That's a really good idea, to avoid doing the URL lookup to see what color the text should be in the display, until such time that a user will really see the URL in the UI. Is it possible to use "delayed expressions" or some lazy equivalent in text markup languages or the DOM involved, so the final tuning of the text for presentation is done as late as possible? It might not be feasible if you can't tell when it will be displayed. A lot of performance tuning involves doing nothing at all when you can get away with it. In throughput bottleneck problems, lazy computation will usually give best time results, by avoiding a computation until needed, so it never gets done at all sometimes. Just about the only time that lazy techniques lose is when you have a time latency to optimize, so you suffer if a lazy computation got delayed until some sensitive time window. Warren Harris wrote: > 1. Necko's MakeAbsoluteURI does map from string x baseURI -> string, and > short-circuits when the input string has a protocol designation, but must > delegate to the baseURI's protocol handler to do the real work. So far, most > of them end up constructing a new nsIURI object (because that's the easiest > way to do the relative to absolute conversion) and then calling GetSpec to > produce a string to return. Do you have hard data that MakeAbsolute is too > slow, or is it just that you know it's called when the mouse moves? (i.e. > can it keep up with the mouse movement or is it really a bottleneck? -- I > would expect the former.) Just for clarity's sake, I'll admit I don't grok any of this at all, but I assume it doesn't have anything to do with something I need to know. > 2. Brendan take note: we're going to use mork for global history. Can/should > we use it for the network cache too? David: Do you have any performance > statistics relative to dbm? Who are the current clients of dbm? Technically I think you'd be using the abstract MDB interface, with the Mork implementation under that currently, which could be replaced with any other db that conforms to the MDB interface. So if I ever finish IronDoc, or if someone sponsors, say, a dbm tiger team, then they could slip in under the MDB interface and have different time/space/footprint etc character. I don't know dbm, but I can characterize Mork performance in general terms against any other disk-based db, based on Mork's in-memory db nature. Once Mork is loaded in memory it will go very fast since it won't touch the disk. (Note I'm ignoring virtual memory disk paging for really big dbs.) So the only two relatively bad performance statistics in Mork will be time latency to open, and the RAM footprint from loading all content in-memory. (Time latency to close/write is more complex to describe, but assuming the incremental writing code is used appropriately, this is not big cost.) Mork suits the ideal case very well when one can really afford to keep all content in memory, and when large db opening does not happen during a very time sensitive moment, or with any high frequency. One main purpose of disk-based db's is in fact to ameliorate the bad effects of time latency to open, and large memory footprint associated with in-memory dbs. When these issues are not a great concern, then a disk-based db has a hard time showing up an in-memory db. However, these issues are usually a concern. If you want, I can describe how a disk-based db using btrees could code the lookups we want to perform, and I can do it in so much detail you could write code from my description as the specification. Using a good disk- based db, we would get runtime lookup performance very nearly as fast as Mork, but using less RAM footprint and having good db open time. I don't know about current clients of dbm, and don't even know where it is in the tree, or whether it does live in the tree. David Mc

history

From: David McCusker <davidmc@netscape.com> Subject: Re: mork for global history? Date: 09 Jul 1999 00:00:00 GMT Message-ID: <37868740.316D35AB@netscape.com> Content-Transfer-Encoding: 7bit References: <378685C5.5224E428@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.netlib [ Actually there was something I needed to cut from this, where I say things about IronDoc and AOL that should stay private. ] Chris Waterson wrote: > Chris Waterson wrote: > > How appropriate do you think it would be to use Mork for the > > browser's global history? I'll preface my other comments below by describing Mork as a text format, with typically linear behavior, that usually must be read entirely into memory for usage (the exceptions are special cases). It can be incrementally updated, so updates are linear with respect to size of changes rather than the whole file. But this makes the file bigger each time, so load time takes progressively longer. (I'll put in update-in-place sometime which will ameliorate this effect somewhat, but not really solve it altogether.) Mork is very similar to XML in terms of general expressability, but it is more concise and can be incrementally updated, and both of these are latency performance optimizations to load or write faster. (Mork has no schema, which means new attributes can be added at any time without violating any internal sense of correct format.) However, the linear character of Mork is hidden beneath the abstract MDB interface, which does not imply the implementation is linear. So one could implement MDB with a fast binary format (or some hybrid of text and fast binary, using multiple files) in order to get better than linear performance without any change in interfaces. [ Snipped paragraph explaining why I'm not currently working on my public domain IronDoc database, but which I'll resume later. ] > Let me elaborate on this a little bit. In MozillaClassic and earlier, > we were using Berkeley DB (I think) to store global history. Berkeley > DB went away, and so Guha coded up something from scratch to get > browser history rolling. I am starting to feel that this > implementation is deficient in a couple of ways. Okay, I understand that. (Too bad about the Berkeley DB problems.) > 1. It requires the entire global history to be read into memory at > startup. This both slows down startup, and may become prohibitive wrt. > runtime footprint. This isn't horrible, because we can expire old > history entries to keep size under control. This will also be true with Mork, but another implementation of MDB (like IronDoc) would open near instantly using a binary btree format. So it could be better in the future. Mork code is reentrant, but individual Mork db stores are not. You could put a Mork history db store on a separate thread, perhaps only during the initial db opening phase. > 2. It doesn't allow history to be incrementally modified; e.g., "I > want to remove the porn sites from my history so that my wife doesn't > find out!" You could selectively remove entries from your history in > 4.x, and I presume that privacy zealots will expect us to be able to > do the same in 5.0. The MDB interface supports deletion, which requires a commit to save. Under Mork, a "compress" commit rewrites the db, and lesser levels of commit cause incremental updates to be appended (but this feature is still in progress today). Another implementation of MDB would permit re-use of file space after deletion, but Mork cannot reuse the space without a compress commit. > 3. It isn't queryable in one very important way: by substring. This > is critical for efficient "partial URL matching"; e.g., for the > auto-complete feature of the URL bar. MDB and Mork will support this because we need it for address book autocomplete, which finds prefix string matches on one or more entry attributes. So Mork will support in-memory sorted indexes, which will not be terribly large in size beause all the strings are atomized. Later MDB implementations will likely use btree indexes. This feature is not yet coded in Mork, but it needs to be done in a few weeks. In principle it's not very hard, except if we want to let callers supply their own comparison methods for sort order. > 4. It isn't queryable by range. This makes it difficult to implement > "dynamic" folders that contain, for example, "all the sites that I've > visited in the last hour". (The current implementation computes this > statically.) The MDB interfaces present rows in tables at specific table positions, so a search for range start and search for range end would both yield integer row positions that could be used to visit all the rows in the range consisting of that section of the table sequence of rows. (I might add another MDB method to do both at once, since this could be done faster in some db's than two successive searches.) > Rather than try to patch these up in the custom implementation, I'd > like to leverage the work you've done in Mork, if it makes sense. I > haven't looked at your code (or even your interfaces), and just wanted > to get a rough approximation if there might be a good match. The > characteristics that I'm looking for are: Since you came by and talked, it seems Mork might make sense. > 1. Ability to delete records, so that I can satisfy the "hide the > porn, honey!" requirement. We can delete, made persistent by a commit. > 2. Extremely fast random access on a primary key, so that we can > efficiently do link-coloring (the style system needs to ask global > history whether or not we've visited each link on the page). This will be extremely fast in Mork, and likely in other MDB versions as well. Mork defines a FindRow() method that locates a row by an attribute value which acts like a primary key. Mork lazily creates a hash table mapping string atoms in that attribute to the containing rows. So it's an in-memory hash table lookup after the first search. This is coded already and David Bienvenu uses it to thread messages. > 3. Reasonably fast search by substring on a primary key, so that we > can do URL auto-completion in the URL bar. We need this for address book autocomplete, so the performance should be rather good. (As we discussed when you visited, the indexes support prefix string searches, as opposed to arbitrary substrings.) > 4. Ability to search by range on a secondary field (link visit date), > so that we can compute the contents of the "all the sites I've visited > in the last hour" folder. (This needn't be particularly fast, > especially if the query can be done on another thread.) This should work well, too, since we can dynamically sort to make new indexes. Mork will index in memory. Public domain IronDoc can add indexes dynamically because it is schema-less (but IronDoc's not done). Some MDB implementations will not be able to index on the fly. > 5. Reasonably fast startup time. This is hardest for Mork, since it must usually parse the whole file. Still the latency should not be bad if history files are not huge. Later MDB implementations can open much faster. > How well does Mork fit? If you think these can be done, let's get > together and figure out how to proceed. Let me know what you need so you can get going. David Mc

blocks

From: David McCusker <davidmc@netscape.com> Subject: Re: Announcing "TripleDB" -- database code that might fit well with RDF. Date: 12 Jul 1999 00:00:00 GMT Message-ID: <378A6A09.334FA91C@netscape.com> Content-Transfer-Encoding: 7bit To: Terry Weissman <terry@geocast.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf First I'm going to write the response I planned Friday, and thought a bit about over the weekend. Later I will cope with the fact I was asked to make some kind of informed analysis, to help judge the suitability of applying TripleDB in some mozilla contexts. It was more fun to respond in near complete freedom from consequences. Now I have to worry about being biased because what I say might affect the choices of database I will live with in the future. This means I'll later have to include some negative risks in my assessment, when normally I would not say anything negative because I like to be encouraging. I feel creepy discussing the potential downside of whatever you are doing since obviously my own self interest in my public domain IronDoc database will, if nothing else, reduce the credibility of my remarks when folks think they might be motivated by a desire to marginalize your efforts. But I'll do my best to be unbiased and objective; still I will err on the side of understating any negative reactions, and this could confuse some folks who interpret my remarks as overstatement of such reactions. Note I have here concatenated your two related postings as if just one. Terry Weissman wrote: > I am not a database expert. I'm just pretending here. I can't seem to escape that label myself, even though I don't like it. Almost everything I know about databases comes from reasoning through problems myself from first principles. My exposure to academic material gives me standard names for common patterns, but not too much else. I'd much rather deal with anyone who reasons about data and problems in context, than someone else with expertise via formal training. You have impressed me as being strikingly good at impartial problem reasoning, so that makes you the man as far as I'm concerned. Nothing would please me more than to teach a bunch of other folks all I know about basic database engines as the result of thinking about them, coding them, and writing about them for some thousands of hours. I've likely spent hundreds of hours just thinking about how to organize btree index nodes. So I have a really big conceptual semantic network about this stuff that's hard to pull out in short snips of linear exposition. I'll summarize ways to do btree nodes briefly, assuming an organization similar to the design I used in IronDoc. There are other ways to do things, like always, but I like my way the best. You could figure out some of this by reading the public domain IronDoc sources I published, but that would be much harder than reading a prose description. (I have this terrible feeling someone is going to ask me to read sources for TripleDB so I can make judgments about it, when I would much rather read prose descriptions about it here if you have time to write them.) I will also include some pointers to my other mozilla newsgroup postings when they are relevant. Here's some context for the first such pointer. I actually took a class in databases when I went back to college for the second time (after a long self-teaching episode); however, in retrospect I'd say the class content was actually theory of transactions, and not theory of databases since it had almost nothing to do with storage tech. Instead of addressing how to store things, the class dealt with theory of coping with failures, using transactions to roll forward/backward and reason about serializability under interleaved modifications. It should have been called a "coping with entropy" class and not a database class. Anyway, a first question you have to answer when you design or write a new database system, is how are you going to recover from failures? (Can you please talk about this somehow in the context of TripleDB?) If your database is block-structured, then you can always use this same standard answer: news://news.mozilla.org/37585CE3.B689DA1F@netscape.com (see "[mork] comparing with the ideal writing cost case" from 04jun99) explains how you can protect a db from corruption due to interruption. There are other reasons to make a db block-structured, but they all tend to complement each other synergistically. Uniformity of structure is a blessing when trying to cope with caching, transactions, space re-use, and algorithms for finessing internal and external fragmentation, even before factoring in how RAM usage interacts with virtual memory paging. In other words, if your db is not block-structured, then you'll have an explosion of complexity and go down in flames before you are finished, or else you will hide performance inefficiencies in nooks and crannies much too deeply ensconced to be flushed out later on when there's time. (So if TripleDB cannot be block-structured, then I have a big flinch.) This tends to explain the block-structured form factor in btree designs as a primary feature, and it motivates the secondary and more complex designs for mapping variable sized objects onto blocks when necessary. The desire for block structure provides a context as strong as the force of gravity throughout most of my remarks; all designs must touch base by remarking on how they map onto blocks, with some performance analysis. This is essentially all locality of reference engineering that aims to apply 80/20 rules onto problems of caching content in a memory hierarchy that ascends from slow disk blocks to fast RAM blocks, with hitches along the way for virtual memory and every other memory caching kink. "How does this affect locality of reference?" is the constant question. I don't question the need for blocks; I just assume blocks. That could hang you up if you don't understand where this is coming from. (An idea for a comedy sketch: Dr. Frankenstein creates life, and then the monster sits up and starts talking about databases ad nauseum until the doctor realizes he might have made some kind of mistake. "Blocks good!" says the monster, and then continues, "Entropy bad!" Etc.) In the rest of this material, I will try to stick to btree node issues. > The reason I chose to use AVL trees rather than btrees is because I > wanted to store the indexing information with each node, rather than > store them separately. You can do this in btree nodes, and I do so in the IronDoc design which I'll tend to assume as I describe more about btrees below. But you can mix them, especially if the btree system does not understand how refs work in the indexing content of any btree dictionary being stored. Each btree node is a set of P tuples where each tuple is size T, and P is the most times that T could be packed into a block of size B. The average fanout F of each node is something less than P when a btree node is not completely full. Occupancy of 85% is a typical figure for a thrashed (random adds and cuts) btree that guarantees each node is at least half full. |<----- B ----->| | T1 T2 .... TP | The tuples in leaf nodes need not be the same size as tuples off inner nodes higher in the tree, and typically are not in the IronDoc scheme. The leaf nodes tend to contain tuples which comprise the associations being stored in the btree, so that when a btree maps keys to values, a tuple will contain both key and value, sized K and V respectively. So a leaf tuple tends to be a pair <key,val> as follows: |<- T ->| |key,val| Values can be zero sized, since only the key is really required, and either keys or values can refer to content elsewhere, or else be self contained without need for any other external content. Since leaf and inner node tuples have different structure, typically the number P of tuples which can pack into one block sized will be different. The inner nodes (which I call 'limb' nodes) tend to contain tuples which are needed to find tuples lower down in the tree. An IronDoc limb node tuple tends to be a triple <key,count,pos> as follows: |<---- T ---->| |key,count,pos| Where the key has the same form as appears in leaf tuples, but where count and pos describe the subtree beneath this tuple. The position of the block containing the root of the subtree is pos, and count is the total number of leaf tuples that can be reached in the subtree. (Note that the count slot permits navigation to the Ith association in the btree by descending on a direct path from root to leaf node, while skipping I associations to the left of the path tranversed, and this is what permits array indexing in O(logN) time; it is just as easy to keep count slots current when inserting associations as the path keys.) Now the purpose of storing keys in the limb nodes is so that seeking a leaf tuple record can be done by descending directly along the path from root to leaf, so it is unnecessary to perform a binary search on all the tuples in all the leaf nodes, because that would be horrendously worse performance with O(log(2,N)) time instead of O(log(F,N)), where 2 is the log base for binary search and average fanout F is the log base for a search in a btree. For large trees, the difference is a sizeable factor. (Here's how you can perform efficient autocomplete using IronDoc btrees. Note the instrumental role of association counts for incremental access. For a given pattern string, perform two searches, one for the smallest matching association and another for the greatest matching association. Each search also generates the array position of a hit at no cost, and this gives the range of matching associations which can be iterated by simply counting integers, without need of a complex iterator object. All this code is done and debugged in IronDoc; the part of the current IronDoc incarnation which is not complete involves the blob file system and not the btree dictionary system, which is all finished.) (For example, if keys are four byte integers, then each limb tuple is only twelve bytes in size, which pack in the hundreds in a large block size of 4K or 8K. For large N, Log(600,N) block hits is much smaller than (Log(2,N)-Log(2,600)). Exercise for the reader: why -Log(2,600)?) This last part explains that a key can be structured in more than one way, and this will address the problem of packing variable sized keys into fixed sized elements that fit inside btree node tuple key slots. The purpose of a limb node tuple is to associate a key with <count,pos>, and this can be done in more than one way, as I explain at some length in abstract terms in this posting not long ago in the RDF newsgroup: news://news.mozilla.org/377BC54E.C32DF133@netscape.com, on 01jul99, (see subject "Re: getting rid of resource factories" in rdf). The key can be placed within the tuple, or outside the tuple via some reference, or some combination of the two to finesse the performance. When the key is stored outside the tuple, and the tuple only contains a ref to the key, then this older posting on bindings is also relevant: news://news.mozilla.org/3755A4D4.36F4264B@netscape.com, on 02jun99, (see subject "Re: Protocol Dispatching Proposal for 5.0" in mail-news). For example, suppose you want to use arbitrary length strings as keys, but you know that median size is 48 bytes, and that strings tend to be ordered relative to each other after say the twelfth byte. This would let you decide to put up to 48 bytes in a tuple, with a ref to overflow elsewhere. Or you could decide to put up to 16 bytes (more than 12) in a tuple, with overflow elsewhere, knowing you will typically not need to access the 'elsewhere' portion while comparing key strings. > It seemed to me that with a btree, sure, you can get pointers to lots > of nodes all at once. But to navigate the tree, you still have to go > chase all of those pointers and read the data so that you can make > comparisons. Yes, the good part of btrees is that one gets lots of tuples for comparison all at once after loading a single node in one block access. The cost of comparisons is marginally zero compared to actual disk i/o. The strategy takes advantage of chunking disk i/o performance, so that each block touched yields max gain on a goal of finding search targets. Once you have a btree node block in memory (or maybe after you just get it from the db's page cache), then you need not chase any pointers to read keys for comparison, when the nodes already have key bytes inside the tuples within the node. It is highly desirable to have enough key bytes in a node so that one typically does no additional i/o per key in order to perform a comparison. More 80/20 locality of reference. > An AVL tree that was stored as a separate index would have > the same problem. But my AVL trees are stored right along with the node > data. So, when you've read an AVL node, you've also got the data it > represents, and you can therefore make comparisons right away. This is true of the IronDoc style btrees. Yes, I have seen databases which were organized so that limb nodes did not contain key bytes, but I considered them to have a very big design flaw in those cases. You can do what you want with IronDoc style btrees, and they have additional advantage in being agreeable to block granularity locality of reference. > I couldn't figure out a way to get that behavior with btrees. (Later, > I remembered red-black trees, which are a variant of 2-3 btrees that > would have worked. That might have been easier to code, but i don't > think it would have performed any differently than AVL trees.) > > But, as I say, I really don't know what I'm talking about here... > Please tell me what I'm overlooking. > > Um, I should have said, I couldn't figure out a way to make that work > that didn't also involve either wasting lots of space on every node, or > would require me to constantly be resizing nodes (which I really don't > know how to do well). Space packing problems are very aggravating. Have you ever gotten stuck in a thrashing design mode, where you kept changing your mind back and forth, while being unable to converge on correct answers in some context? Let's cause this malady "design syncopation", and note it shows one is stuck in some space that is not worth big cost in mental effort. Deciding exactly how big to make certain kinds of objects is the number one cause of design syncopation in my past experience with db design. I eventually settled on a sweeping general rule to resolve most of such unpleasant design problems. Either analyze the known size frequency distribution, or assume something plausible like random continuous distribution across the whole range of permitted sizes. Then design so that nothing horrible happens for any given size, or range of sizes. You'll still have your hands busy even after this simplification, but at least you'll know nothing horrible can happen after you're done. Putting variable sized keys into btree indexes definitely falls into this family of hard problems that can make one chase one's tail. It's a mistake to think of making size of nodes variable, because this ruins all the nice things that happen when everything is uniformly sized. If the keys are variable sized, then make the space for key bytes in a node have a fixed size that is somewhat more than needed to handle the most common cases so that accessing more overflow bytes happens rarely. But not so much larger than one averages that much wasted space on average. However, a little wasted space is very little cause for any concern in limb nodes for the following reason. Suppose the average node fanout F is a modestly small 25. This means there are about 25 leaf nodes for every limb node, so limb nodes occupy only about 4% of all the space used by a btree index. Optimizing the fractional space usage of only 4% of a data store is a total waste of time. You should only worry about space wastage in the leaf nodes. However, they can be related with key form factors in limb and leaf nodes are exactly the same, as they are in the IronDoc design. But this analysis explains why putting key bytes into limb nodes in the first place is a no brainer, since it tends to affect only a trivial percentage of space used by a btree index. David Mc

tradeoffs

From: David McCusker <davidmc@netscape.com> Subject: Re: Announcing "TripleDB" -- database code that might fit well with RDF. Date: 12 Jul 1999 00:00:00 GMT Message-ID: <378A86CC.6ACAECE8@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf [ Oops, I responded privately to the private email first; but then I see we want to continue the thread here. I might tweak something if it seems a word must change for public standards. Maybe I would have used a slightly different tone if I though it was going here. ] Terry Weissman wrote: > davidmc@netscape.com wrote: > > Anyway, a first question you have to answer when you design or write > > a new database system, is how are you going to recover from failures? > > (Can you please talk about this somehow in the context of TripleDB?) Thanks for taking the time to respond, and so quickly. > So, TripleDB was designed for a problem where the data is not quite as > vital as a generic database. I don't ever ever want to have a failure, > of course, but if a real disaster happens, throwing out everything and > starting over is *not* out of the question. This level of reliability > is exactly what you would want for things like mozilla's cache. I was recently going to post in a newsgroup that this seemed to be the standard recovery plan for Netscape clients, to throw all the content away and start over when a problem is encountered. That's a valid approach when the data can be rebuilt, or is not needed. I only ask because approach to transactions is part of the general character of a db, so one wonders what it is, especially when some more proactive approach can have a complex interaction with the design. > If I have a failure, I'll do one of two things: > > (1) Throw out everything and restart with an empty database. > (2) Attempt to rebuild the database, in a long, slow, inefficient > manner. I have four different indices through the database. To > rebuild, you go through each index and try to get all the valid > records. Every record you find that seems valid, you write into a new > database. That means you'll find almost every record four times. I > hope you will tend not to find garbage, and maybe I'll put it some > stuff to help protect from that (hand-wave). That's not too bad. Handling a failure has two parts: do you try to abort partial recent changes if things go sour? And, when a db is actually corrupt, how do you recover? Recovery is very hard. If you can grovel your format and extract content from undamaged portions of the db, then that's better than par. It is good for a db to have enough context independent structure that you can actually pull info from random disjoint pieces. I like that. > The main issue in my head here is how to detect a failure, and how > much I want to spend to ensure accurate failure-detection. It becomes less important when you don't try to abort after an error, since you are not really trying to separate good state from bad. The kind of failure-detection I most admire, and hope you implement, is resistance to random bytes to avoid runtime crashes. What happens if you replace some portions of your db with random bytes? Can this cause your runtime to crash? If so, then that's bad. However, many db's suffer from crashes when they contain some bad content. I don't think that's a very good standard of care. > I could add a bit at the beginning of the database file. Whenever > I start writing to the db file, I first check the bit; if it's set, > I know the database is corrupt. Otherwise, I set it, do my writes, and > then reset the bit. But I'm not yet sure the gain here is worth those > two extra writes and a read, every time I touch the file. That makes it sound like you write the file the instant any content is modified, as opposed to buffering up some changes for a commit. This is okay on Windows, which has a file system that behaves as if a page-buffered file cache is kept in memory between apps and disk. (This is why the IronDoc style page cache has much less effect on MS.) But if you did that on a Mac, the effect would be really horrible. > > In other words, if your db is not block-structured, then you'll have an > > explosion of complexity and go down in flames before you are finished, > > or else you will hide performance inefficiencies in nooks and crannies > > much too deeply ensconced to be flushed out later on when there's time. > > (So if TripleDB cannot be block-structured, then I have a big flinch.) > > Well, I'm still not sure what you mean by "block-structured", but from > what I can tell, no, TripleDB is not. But I don't feel like I have any > of the problems you describe. By "block-structured", I mean a file is partitioned into a collection of uniformly sized disjoint blocks that completely tile the file content, where blocks are read and written in their entirely, and not in pieces. You do have the problems I described if you perform any significant number of file writes to many disjoint byte ranges in the file as the result of some relatively minor user operation. But those problems would not be very apparent on Windows; this is one of the reasons why Windows has been taking market share, because they have optimized the file system so that non-optimal db algorithms will work fine on Windows but abysmally on other platforms, like the Mac. (This is not your fault.) Let's say you have 100,000 triples in a db, and then add a new one which gets indexed by all your AVL trees. How many reads and writes will that cause to occur? Keep in mind that on a Mac, you will probably cause an actual 15 millisecond disk seek for each discontiguous read and write, so that at 60 seeks per second, you will spend some seconds waiting for such each individual operation to complete if you must read a hundred or two hundred nodes in order to operate on all indexes. > Here goes a description of TripleDB's file structure: > > The first 1024 bytes are reserved for header info. The most important > thing here is the pointers to the four different AVL trees. A leading file header is a good and common approach. > Each record in the DB is something like this: > > record length (4 bytes) > Tree data for the four trees. This consists of a left pointer, > right pointer, and a byte to store the "balance" (used by AVL balancing > algorithms). So, 9 bytes per tree, 36 bytes total here. > The data for the triple stored here. Each member of the triple > consists of the type (1 byte), and then type-dependent info. (If it's > an integer, then this is just the data; if it's a string, then this is > a length (2 bytes) and the data.) > > So, total overhead for each record is 43 bytes: length + tree data + > type info for each member of the triple. That is not very much overhead, as those things go. A few tens of bytes is decent for managing something like a blob in a file system. The actual disk space consumed is actually not as a great concern as how many disk i/o's are caused by specific file format organizations. > There is actually a fifth AVL tree: free'd nodes. When you delete an > entry from the database, its record is removed from the four trees, and > then it is put into the free tree, sorted by record length. When we > allocate a record, we search the free tree for a record of exactly the > length we need. If we don't find one, we grow the file. This is > potentially very wasteful of disk space, depending on how varied a > record size we need. I may tune this (i.e., learn how to break up big > records into smaller ones when needed.) For contrast, IronDoc uses a block bitmap cached in memory, with some blocksum checks since clobbering a bit causes a small catastrophy. All the blocks are the same size, so any one will do. Actually IronDoc starts searching for the block nearest the one at an ideal position you request, so that locality of reference will not degrade past some point achieved in a balance of adds and cuts. Your approach sounds very reasonable. However, I'd warn you to worry about what happens when you use a system as it's own free space manager, when freeing a piece of space causes the free space to grow in a way that must allocate the just freed space, which then becomes no longer free, which then must be removed from free space, and so thrashes. > Your description of btrees strikes me as wasteful of disk space. Ouch. You get to choose which time/space tradeoff you prefer. You either put a ref to keys kept elsewhere, and save space while using more time to perform key access to compare. Or else you put keys in the nodes directly, which uses more space to go faster. You can't have it both ways simultaneously, except for hybrid approaches which just pick a point somewhere in between. You can go fast and waste space, or you can go slow and save space, or you can pick a compromise. I don't see why that wastes disk space, when you pick exactly the form factor that suits performance you need. > I don't want to duplicate the keys into every btree. In my case, the > keys *are* the data; there is no other data in the database. If I put > the full key into the limb nodes, the DB would be four or five times as > big as it is now. You talk about: You don't have to duplicate all the keys in every index. It is desirable for speed, but you don't have to. The reason why values can be zero sized is because sometimes keys are the data, just as you describe. A faster db is sometimes larger on disk. That's often okay when the result is many fewer disk i/o's. Maybe that's counterintuitive, but it happens. Usually one optimizes for speed, because disk is dirt cheap, but user time waiting for UI response is very dear. But sometimes folks say, hey don't use more than this max amount of disk, which causes a limit on space that causes slower peformance. > > For example, suppose you want to use arbitrary length strings as keys, > > but you know that median size is 48 bytes, and that strings tend to be > > ordered relative to each other after say the twelfth byte. This would > > but I don't pretend to know any such thing. I am making no assumptions > about what kinds of data will end up in the DB. I'm not in a position > where I can analyze the data that will go in; it turns out the data > I'll be storing doesn't yet exist. You have to make assumptions. Making "no assumptions" is a kind of assumption, which means you have to design under low information about usage, which requires you to handle larger ranges of sizes, etc. Your db should be open enough to permit an application developer like Chris Waterson to specialize the way trees work so the information he has about RDF usage can be applied to tune for performance. For example, IronDoc lets you decide how big keys and values are in a btree dict, and you have to provide code to compare keys and copy them in and out of node tuples. This is how IronDoc manifests handling of the "no assumptions" assumption, by permitting end developers to apply the basic code as appropriate in more specific contexts. When I said that above regarding assumptions about string sizes, I was not describing assumptions that you must make in writing TripleDB; rather I was describing assumptions a developer must make when applying btrees, in order to choose key size. If a developer makes a "no assumptions" choice identical to the TripleDB choice, then a key would be a minimal four bytes in size to ref a remote shared location. David Mc

tripledb

From: David McCusker <davidmc@netscape.com> Subject: Re: Announcing "TripleDB" -- database code that might fit well with RDF. Date: 13 Jul 1999 00:00:00 GMT Message-ID: <378BE894.1FC20583@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf Terry Weissman wrote: > Yeah, well. I have put in checking every place I can. I have done my > best to always check for return results and abort out. I am absolutely > positive I have missed something somewhere. I should write a test > program that writes random stuff in and see how well I do. Just trying to protect against random disk bytes is a big improvement, because it makes it possible to fix bugs of this nature in the future, instead of being nearly impossible to retrofit into old uncaring code. The random write test seems the right approach, which I plan to do on my own stuff later, but have not done yet to date. > > That makes it sound like you write the file the instant any content > > is modified, as opposed to buffering up some changes for a commit. > > This is okay on Windows, which has a file system that behaves as if > > a page-buffered file cache is kept in memory between apps and disk. > > (This is why the IronDoc style page cache has much less effect on MS.) > > But if you did that on a Mac, the effect would be really horrible. > > Yes, that is essentially correct. I do a little bit of buffering in > memory, but I'm in a world where memory is precious, and so I try to > sync up to disk as much as I can. I hate to keep repeating the same song, but btrees are also meant to optimize memory usage when RAM is precious, by clumping touched parts of files as much as possible, so that buffering is most effective when a buffer cache turns over as little as possible. The more a db is touched in shotgun fashion, the less effective any cache will become. This will be true whatever your RAM budget. So the btree design in IronDoc is meant to hew closely to the optimal time/space tradeoff curve with minimum hysteresis. So choosing less RAM for buffering will yield best possible performance in a context. As far as I know, there is not a better way than btrees for db caches. > I will go ahead and make this extremely non-politically-correct > statement: I don't care about the Mac. At all. For my purposes, > TripleDB will run on Unix, and maybe on Windows, and not at all on > Macs. If that ends up with a TripleDB that is completely unsuitable for > Netscape and mozilla.org, well, so be it. I'm not interested in doing > anything that hurts the Mac, but I refuse to let the Mac get in my way. I don't mind non-PC stances in general, and take many myself. I just don't want folks to accidentally make a technical choice without knowing all the ramifications, and the effect of AVL trees on Macs seemed big. One would not want to drop a platform, or incur large catch up costs, just because one did not see it coming as a result of storage choices. Technically, I think Unix also has trouble with massive file seeking, and I think Simon Fraser has much more experience investigating this than myself, so he should be the one to say what effect he expects. Network file systems should also be heavily affected by seeks without a good page buffering system under tightly clustered btree block usage. I'm highly ambivalent about the Mac myself, so I sympathize. I thought it was several years overdue for overhaul in 1990 when I started working on the Pink operating system. I like using a Mac, but positively hate developing for one without the robust features of modern OS offerings. (I think Linux will be my next operating system of choice.) > It does strike me that if the MacOS does this poorly with files, then > it ought to be fixable by NSPR. I don't think NSPR has anything like the page buffered file i/o I put into 4.5 mailnews code (which was a port from pd IronDoc page caches). We could use that code to fix the file buffering, but that would only fix the performance of shotgun file access so it stops being horrible and becomes merely moderately bad on Macs and some Unix boxes. The binary search access of AVL trees is good in the sense that nothing very bad happens algorithmically as you scale upward. But it does not scale especially well, so a client on Macs would tend to have only losing performance compared to other clients, as opposed to intolerably bad. The log base B in O(log(B,N)) is best when much larger than 2, since it can make a difference in empirical performance by a consistent factor of log(2, B), which can be the difference between imperceptible pauses and annoyingly long waits when this is near an order of magnitude. > > Your approach sounds very reasonable. However, I'd warn you to worry > > about what happens when you use a system as it's own free space manager, > > when freeing a piece of space causes the free space to grow in a way > > that must allocate the just freed space, which then becomes no longer > > free, which then must be removed from free space, and so thrashes. > > I don't quite understand. Freeing a node doesn't require any further > allocations; it just twiddles pointers on the existing records. > "freeing a piece of space causes the free space to grow in a way that > must allocate the just freed space" does not make any sense to me. Okay I see why there's no problem, since you use the space itself to record it's own status, as opposed to having a side metasystem that annotates the status. If you want to understand the pathology I meant, just imagine using some nodes to describe the free status of others. David Mc

triples

From: David McCusker <davidmc@netscape.com> Subject: Re: [rdf/db] nature of RDF triples to store? Date: 11 Aug 1999 00:00:00 GMT Message-ID: <37B22B4B.4EBBB646@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf Chris Waterson wrote: > David McCusker wrote: > > If I wanted to write a db API for storing RDF triples, > > what kinds of data would actually be involved? > > You should start by reading this document: > http://www.w3.org/TR/REC-rdf-syntax/ I've seen that spec before, and think it's badly written, though I'm sure no one cares about my opinion in that regard (and this also constitutes yet another case of lese majesty on my part :-). How well it is written is irrelevant to my basic problem, which is that most of the spec addresses serialization syntax and/or schemas, when neither of these have any bearing at all upon any alternative format that is schema-less, such as I'm considering. For example, semantics of RDF containers is a high level schema policy effect, and a low level engine does not need to understand anything about containers. > In particular, Section 2.1 "Basic RDF Model". Yes, it seems that only section 2.1 in the entire spec is useful. Also, your answers below are helpful, and are stated in other terms than appear in the spec, so you give me better perspective. Unfortunately, section 2.1 does not go into definitive detail since it is just an intro, and leaves concreteness and full completeness to the semantic model under the care of sections that describe syntax and schemas, when these are not necessary aspects of the basic structure for the semantic model. The examples make clear that the model graph is composed of element well-illustrated by node and arc diagrams such as "http://www.w3.org/Home/Lassila" --creator--> "Ora Lassila". But these examples do not clarify the nature of the nodes very clearly, other than to repeat the self-referential gibberish about resources, without proposing any mathematical model put in plain english that says "a resource R is an N-tuple R = {blah blah blah}, where the first blah is foo, etc. etc." I wish the spec had a model intro aimed at folks who have read a book on discrete math, and know how to understand simple models expressed in unambiguous terms using conventional terminology that borrows from the math community. Instead, almost all the intro seems aimed at programmers, web developers, or AI knowledge representation folks. All the rigor appears only in the grammar for serialization, which is irrelevant for my concerns. I would have to infer a rigorous mathematical spec for the model without reference to the serialization spec. Boy I'm such a whiner. :-) (Maybe I'm just in a small minority who thinks the spec is not technical enough, as opposed to too technial.) > > When do these have persistent identity, and when do they not? > > Is identity ever an integer? Or is identity always a URI? > > A "resource" has identity, expressed as a URI. A "property" is a > resource. A literal can be "a simple string or other primitive > datatype defined by XML". I gather that a triple equals a statement, and a statement equals <subject, pred, object> (also subj-pred->obj), and that all these s, p, and o are references to potentially complex composite objects. The language used in the spec intro highly suggests the authors assume an object-oriented coding interface, which allows that to say something is a "resource" without specifying what that means since you can just go look a the API and see what it means. But that is backwards, since a spec should drive the code and not the other way around. This assumption is awkward, because I would propose a non-oo API to make the same thing happen, where I will use a cookie for a resource instead of using an object for a resource. That lets a db implementation use a persistent ref for a resource, and only lazily unpickle a runtime interface for a resource when it is actually required in response to caller demand. Anyway, it looks like a db can probably use dynamic typing in the implementation, and then every triple need only be stored as the three persistent refs to subj, pred, and obj. Then actual runtime usage would stimulate looking at what those actual are inside. > > When are x, y, and z integers, when are they ID refs, and when are > > they variable length content blobs such as strings? > > I'll assume <x y z> to be <subject predicate object>. "x" must always > be an ID ref (i.e., a resource). "y" must always be an ID ref (i.e., a > property, which is a kind of resource). "z" may be an ID ref (a > resource) or a content blob (a literal). (By <x y z> I meant that a triple in math parlance has three elements.) It sounds like an implementation can make all of them refs, and then use dynamic typing (i.e. explicit persistent type annotation) to distinguish resources from literals as necessary. > > When might things be conveniently ordered or sorted by a db? How > > many indexes for sorting might be required? (Actual sort order > > does not matter, since an API can always be set up to factor key > > compares to be performed by clients, without db comprehension.) > > Certain resources can be marked as ordered "containers". See Section > 3.1 "Container Model". These are likely candidates for indexing. The container material is too schema specific. It would be easier to provide a db API like the IronDoc or MDB interfaces which permit ordered collections of heterogenous objects, and then leave it up to an RDF implementation to map containers onto these collections. The mechanics of indexing collection memberships by attributes found in the members is an old staple for db semantics, so a generic way to do this could appear in a db API and let an RDF implementation apply this as needed. Most of the container model section of the spec seems devoted completely to coping with imposing schemas and keeping control, but a db would not be in charge of enforcing such policies. A much better db design would present only mechanisms, and let clients use these to enforce schema policies. > > I know if I studied RDF docs and code interfaces, I'd eventually > > be able to infer most of this myself. So I'm asking to save the > > time to do that, since I probably will not spend the time to do > > that on either Netscape's time or my own personal time. > > The spec is short and accessable. Perusing it would be well worth > your time if you are really interested. I really appreciate your taking time to help, and I'm sorry for my bad attitude about the spec's quality. While I like XML, I have no interest in the serialization of RDF in XML, since that would not be my problem when proving an alternative persistent db store. Anyway, I think I have a pretty good handle on it now. I just think RDF suffers from some awkward terminology choices that tend to hide meaning instead of expose it. Here I'll give a brief explanation. For example, when writing docs about object-oriented software, it is a very bad idea to actually use the word "object" to describe any element of a system or framework since it conveys no information greater than "thing", so it wastes the opportunity to use a more specific word for greater clarity. Docs should use words that emphasize roles in contrast with other related roles. If I tell you a story about a doctor and a patient, you have a good idea what I'm talking about. But if I tell a story were I insistently call every character a person without helping you distinguish one person from another, then I just confuse you. In case of RDF, where everything can be a "resource", this causes the word "resource" to convey no information whenever it occurs, since it applies to just about everything. So docs about RDF should avoid the word like the plague when it is feasible to do so, just for clarity. (This is a basic information theory thing, that info is useful only in proportion to how unexpected it is in context.) David Mc

models

From: David McCusker <davidmc@netscape.com> Subject: [formal models] spatial visualization of RDF? Date: 07 Sep 1999 00:00:00 GMT Message-ID: <37D5B8EC.64F2F931@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf I started drafting a formal db model for RDF on weekends, which includes both an ad hoc formal syntax and an example C++ interface, but it's not quite done yet. This last weekend I got hung up on the question of whether a resource as any other attributes besides the set of triples which have that resource as the subject. A resource has a URI identity, but all the content in a resource is just the set of properties appearing in triples. Is that right? I thought of a spatial representation of RDF content for the purpose of visualization. It's different from the graph representation, but obviously also related. I haven't decided how useful this is yet. But it helps me to think of theorems in geometrical terms instead of always in graph theoretical terms. So I think an RDF graph can be visualized like the following. Please tell me about any aspect I omit or mutilate in this model. I see a resource as a sparse line in space. Instead of a continuous sequence of adjacent points, a resource is a sparse cloud of points. Each point is associated with a triple. A triple with a literal as an object is just a naked point on the line. But a triple with another resource as the object is like a vector from one line to another, except the vector points at the entire line and not just one point. None of the points for a line are ordered, since the properties for a resource are not ordered. So this line is a rather conceptual cloud of related points, but I see them all on a line in 3d space. I think of the URI as a 3d equation for a line which defines where a point might exist on the line, but most of them are absent. Only the triples actually present on a line are presented. Navigation between lines is via the triples pointing at other lines. Each such vector has a well-defined point on the source end, but no particular destination at the other end, since a vector points at the entire cloud of points. I think you can get this representation by performing a transformation on the topology of the conventional graph used for RDF triples. If you cut the closed loop representing a resource node, and stretch this into a straight line, and erase the loop (except where arcs originally left the node) then you get the point cloud I am describing. When you cut a loop and stretch it out like that, it becomes unclear just where an incoming arc arrow should point, and that's why a vector points at the entire point cloud. David Mc

rdf

From: David McCusker <davidmc@netscape.com> Subject: [triple stores] some triple model analysis Date: 23 Sep 1999 00:00:00 GMT Message-ID: <37EADA01.3C501000@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf [ I wrote the following about a week ago; I'm reposting it here so folks interested can see where my rdf triple store thinking has been recently. ] Below I talk about the essential nature of RDF graph modeling, to help you think about ways to pick languages to describe graphs. By the way, I find this very interesting given my desire to advocate more binding oriented programming practices. Why do I advocate that? I can answer that in the context of an old conversation I had with an acquaintance named Bernie Weiser (hope I spelled his name right). He pointed out that I like to solve problems in parallel, and that my points are sometimes hard to understand by folks who think serially. Exactly so. In analogy to electrical circuits, a network with many parallel paths from point A to point B has less resistance between A and B, so flow is better accomodated. Similarly, data flow apps are best served by having more paths and more flexible binderies along the networks of communicating graphs, since this reduces cost along some dimension or another (real cost, opportunity cost, or whatever). David Orchard wrote: > Additionally, for generic binderies to be created, there must be > the ability to specify graphs of objects/relationships to be > bound/transformed into the different schemata. I have been unable > to find relevent graph specification languages other than RDF for > use in either EJB finders or XML Link graphs. Some of the CORBA > externalization work looks interesting. I know how to capture RDF graph info in the least complex form, and I describe this below, since you can use this to design your choice of syntax to capture the same thing, and know you are just as good an information channel as whatever RDF can convey. This came about because I started investigating a formal model to describe RDF content, because I intend to specify a formal db model (and an example C++ interface which is able to capture this model). It turns out that all RDF content can be expressed by the set of triples that must be stored in the persistent RDF model. Once I describe precisely how the triples are structured and what they can mean, you'll know you can encode all RDF content by choosing your favorite way to encode the triples. Please don't feel too bound by convention, and use your favorite RDF bindery instead. :-) Here's an example of a typical text triple, on the following line: davidmc@netscape.com --firstname--> David Here's a pseudo grammar for that text line format: Triple ::= S " --"P"--> " O Where S, P, and O denote 'subject', 'predicate', and 'object' and are all basically just strings. Let's ignore the fact that some kind of quoting convention might be required to permit arbitrary bytes to appear in S, P, or O without confusion with the markup. Assume every piece of a triple is just an arbitrary byte sequence. In pseudo math jargon, we can say any triple T = <S, P, O>, where S, P, and O are all members of the set of arbitrary blobs. Since I think the (subject, predicate, object) terminology stinks because it draws in too much formal baggage from various academic disciplines, I will give them slightly different names to clarify: Using new names, we can say any triple T = <R, C, V>, where R, C, and V are called 'row', 'column', and 'value'. In other words, any triple is identitical in meaning to a single cell in a table that is composed of rows along one dimension and columns along another. The expressive depth and complexity of RDF derives from the fact that a column C can also be a row R, and that a value V can be just about anything, including rows of course, to make wild graphs. In plain language, each triple states a single fact about an object (which we call row R or subject S), which has a single property (which we call col C or predicate P), with some attribute value (which we call val V or object O). That is the semantic intent of each triple. Now I have left to describe just two more things: 1) formal constraints on what can go in each of the three different triple slots, and 2) a way to encode this information in a minimalist fashion using only strings and integers with well defined mappings and relationships. Technically, all three slots are just strings (byte sequences), so you can put anything at all into them, as long as you do not feel bound by a particular schema. So the formal constraints are only upon what it *means* when you choose certain strings in R, C, and V. Both R and C are always considered names identifying a row, but V can be either of two things: V can be a row ID just like R, or V can be a literal string. Presumably this means that syntax for V must distinguish between "Foo" as the name of row Foo, and "Foo" as the string composed of bytes 'F', 'o', 'o'; so V has two modes. An RDF universe is composed of a set of triples T, which is a subset of R x C x V, so every t in T is some triple (r, c, v), such that r in R, c in C, and v in V. The three sets R, C, and V are all logically subsets of set B, the set of all byte sequences. The memberships of R, C, and V are defined by actual memberships in the set of triples T, so for every triple t = (r, c, v), R is defined to be all the r's, C is all the c's, V is all the v's. Other than schema imposed constraints, any byte sequence can be a member of R, C, or V just by adding a suitable triple to the model. So the only magic lies in how these memberships are interpreted. RDF calls every r in set of rows R a 'resource', so the terms row and resource are interchangeable. Actually, r is the identity of the row, as opposed to the content of the row. RDF prescribes that every name r for a row (resource) must be a URI, however this is only meaningful as a schema constraint on legal row names. The content of a resource/row r is the subset of all triples in T which have the value of r in the first position. A row (resource) has no other state besides the URI ID r and the member triples. So if the data universe is a big matrix or table, then a resource is just the set of non-empty cells in the row r naming the resource. By convention, every property/column c is also the name of a row r, so that metainfo about the nature of that column can be determined by examining the properties found in column's set of row cells. But possibly a column c is nothing more than the URI name, since there need not be any triples in T where c appears in slot r, unless a schema is used to force this condition to be true. In principle, however, there's no reason why c cannot be a URI without properties. A value v can be anything at all when no schemas are enforced. One must be able to distinguish between byte sequence as string and byte sequence as URI (i.e. row/resource identity), so one extra bit is needed besides the byte sequence itself. RDF uses the term 'literal' when referring to a value which is not a row URI name. That concludes the section on RDF semantics. The following is a much shorter description of how one might encode all triples in terms of strings and integers, in an efficient fashion. There are two obvious encodings for rows, and both might be used at once. First, assume that all strings will be atomized by interning in a string table. So a runtime model provides a bidirectional mapping from strings to associated integers, and vice versa. This means any byte sequence in R, C, or V can be given a unique integer name, so that any triple t = (r, c, v) can be represented as three ints. This approach has two nice effects. Every URI r or c has a fast integer representation for speed in runtime operations, and every literal v has a shared representation with every other literal containing exactly the same byte sequence. A row can be represented by a sequence of (c, v) int pairs where each c is distinct, or else a row can be represented as a range subset of a index (such as a btree index) that sorts (r, c, v) triples first by r and then by c. Both these might be done. For use as a graph specification language, any encoding of a set of (r, c, v) triples will be isomorphic to the RDF system in terms of expressive capacity. Many reasonable text represetations come immediately to mind, and several kinds of efficient binary runtime systems can be derived from the string atomizing model above. David Mc

pgp

From: David McCusker <davidmc@netscape.com> Subject: Re: PGP integration Date: 21 Jul 1999 00:00:00 GMT Message-ID: <379662BB.FCF347B7@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.crypto Geoff Thorpe wrote: > David, I would like to define my own mime-type for inter-office > financial reports so that I may handle that mime-type in mail > clients and cause them to activate the appropriate software and > pass the encapsulated (and unencoded) data to them. (One moment while I change my hand puppet to something better...) This is a good topic for the mail-news group. We wouldn't want to talk about that in the crypto group, since that might give someone the impression that security was somehow involved, when obviously that is just some safe, generic stream transformation mechanics. If someone in mail-news insisted on trying to figure a way to make PGP work with those mechanisms somehow, then we should probably ignore that since it would be material suitable only in this group. We'd might even have to vigorously discourage such talk in mail-news. Even non-technical folks are very smart about seeing through pretense when it is used, so we'd have to stick to just dull old stream transformations, and how one can open up the mechanics for mime encoding to suit perfectly boring data organization purposes. We should try to make sure only boring purposes can be served. We can just talk about bindings and mappings, just like in every other computing application scenario. All the content is a graph, and nodes have types, and the types bind to handlers, and the bindings are resolved by mappings, and the node content gets mapped in app specific ways by handlers, like financial data. Just a big happy machine filled with gear wheels, where you can put in some of your own wheels when the machine needs new gears, like when you want to do something with financial data, and we don't know how to help you with your financial app so we should not get involved in that stuff since it's out of our territory. > Is it possible for the mozilla group to provide some advice on > how I would register the mime-type (again, for financial reports) > with the mail client, Sure we can talk about mime-types all you want, but we won't know anything about your financial reports, so the finance part would be off-topic and might distract folks focusing on the mailnews part. But to provide folks with context, you might want to mention your financial app interest without going into a lot of depth. If we thought you were doing something we shouldn't help you with, then that could mix up our priorities and interfere with your progress. > and provide hooks to handler code so that the user may add data > (that is to say "financial reports") to emails (which the mail > client will encode and tag with the mime-type) and then automatically > invoke handler code when such a mime-type is present in incoming emails. Well, we'd have to do that or else the system would be closed. In order to be open and extendible, there would have to be some ways to bind mechanisms dynamically at runtime based on content type. It would be interesting to ruminate about some interesting web uses for such features, to do neat stuff with pictures, or music, or all that stuff users like to consume, and we could anticipate what great incentives this would give to entrepreneurs to come up with new media formats and the like. That might really boost the economy. New mime types for high definition television sound like just the ticket, etc. > My urgent need is for a "financial reports" mime type (as I might > have mentioned already), but I'm also thinking of adding an SMS > mime type (that's not a mis-spelling of something you're not allowed > to discuss - it's the phone-message format) to improve inter-office > communications, and oh ... ummm ... I think I might also want to > define a "todo-list" mime-type that will cause mail-clients to > display a bolt-of-lightening and a warning from the managing > directory that the tasks must be completed on pain of death (with > audio-visual support that I can code myself in the handler code). > So, we have financial reports, SMS messaging, and todo-lists. There ya go, you have lots of interesting e-business applications that need more support, and I bet there are lots more coming as soon as folks give it more thought. I think the business-to-business market is going to go through the roof, and it's going to require some exotic XML mime types so email can send business agents that announce their vocabularies and the ontologies understood by the various business components. It's a shame the business uses are so boring when the mechanisms are fun. > It's safe, I'm not interested in plugging crypto into mail-clients. > Not at all. Excellent, I'm happy to hear it. So don't tip the pinball machine because it can tilt and eat all the free games you racked up. David Mc

asis

From: David McCusker <davidmc@netscape.com> Subject: Re: Will the released code compile and run as-is? Date: 02 Mar 1998 00:00:00 GMT Message-ID: <34FB59FA.E03A5F8F@netscape.com> Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Organization: Netscape Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.general Austin Ziegler wrote: > It is my not-so-humble opinion that whatever database is chosen to > replace the news/mail databases should work across platforms -- dbm > is currently very Unix specific, and this is not a positive situation. Yes, that was one of the criteria leading to the -????- choice. Any replacement should be cross platform, both in source and in file format. Additionally, it would be very nice (but not strictly required) that more than one index be able to go inside a single file. Daniel Veditz wrote: | I believe we have ported at least parts of dbm to the Mac and Windows | OS's, I'm guessing that's what's been used for the certificate databases. Yes. I was told there was no Mac version except for our port. However, I assume the file format is not cross platform (meaning that byte swapping to fix endianess is absent). I understand dbm has only one index per file. | It's a very sad fact that Communicator contains *multiple* database | engines, and that obviously needs to change. I'm in favor of using my own IronDoc database, whose implementation is not yet complete. Within a few months it will be far enough along for use in mail, news, and address books. Source and file format are both public domain (not GPL) and cross platform. Any number of indexes and variable length blobs can go in one db file, and blobs support stream i/o. IronDoc has no schema, which means when you change either object formats or associated groups of attributes, neither of these things outdates the file format. Content is tagged with types intended to be equivalent to MIME types, or anything similar, and the freedom to type at will means the schema is dynamic. So blobs and indexes can be incrementally versioned. IronDoc is almost all mechanism with very little policy. There's no class framework. Index whatever you want, however you want, or don't -- it's all up to you. There is little pushback against whatever you have in mind. IronDoc even has a simple memory management policy: none. Since IronDoc doesn't allocate memory, apps have total footprint control; IronDoc can easily be used as a persistent store for garbage-collected systems. IronDoc is really a construction kit for a file-system-in-a-file, where the blobs are considered un-named files, and btree dictionaries can be used for whatever you want, not just keeping track of directory and file names. You can make arbitrarily complex networks using the unique ids of blobs and dicts. It will great to hold XML content trees for DOM. IronDoc does not have scary synchronization problems between in-memory data structures and the database file, because IronDoc does not build in-memory data structures to represent content in the database. Instead, database content is used directly with update-in-place (and copy-on-write for transactions), while either real or pseudo virtual memory handles the performance tradeoff of footprint versus storage i/o time. Because IronDoc's interface for files is abstract, the actual "database" does not even need to live in local storage, and can be network remote, with pseudo virtual memory paging to minimize network hits. But IronDoc is not done yet, and my progress on weekends has been crawling lately for a variety of reasons I needn't go into here. And early versions will have little direct support for multi-threaded access. But folks might want to keep it in mind and see what's up later in Q2 98. David McCusker, xp mail/news client db and address books Values have meaning only against the context of a set of relationships.

sleepycat

From: David McCusker <davidmc@netscape.com> Subject: Re: Sleepycat DB 2.0 Date: 30 Mar 1998 00:00:00 GMT Message-ID: <35201989.238D4C9B@netscape.com> Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) Mime-Version: 1.0 Reply-To: davidmc@netscape.com To: al@alsutton.com Content-Type: text/plain; charset=us-ascii Organization: Netscape Newsgroups: netscape.public.mozilla.general Al Sutton wrote: > I've got some Questions about IronDoc (some dumb, put just so we can > all get an summary of it). They don't seem dumb to me. (Requests for info, even repetitive ones, do not really say anything about a person's comprehension.) > 1) Will it allow us to keep the UNIX style mailbox format for the main > file? Yes. IronDoc indexes can refer to content outside the db. In fact, IronDoc doesn't even care what you put in the indexes, as long as your methods for custom btree dict subclasses follow ordering invariants. So having btree entries which contain indirect pointers to remote content is perfectly okay, and the semantics can't interfere with IronDoc. There aren't very many things that IronDoc cares about, so it has more mechanisms than policies. You should be hindered very seldom. The downside is that IronDoc must operate without total global knowledge, and perhaps this makes an app developer responsible for more issues. > 2) How easy would it be to write a Java version? (i.e. are the data > formats stored on disk in a Java friendly byte ordering) Not very hard. The difficult part is very easy to describe: blocks have content in a variety of formats, and it must be easy (or efficient at least) to access byte arrays as content in appropriate language types. So this requires some clever end around Java's dislike of casting, or perhaps a minimal native layer that makes such actions not as costly. > 3) Would you be happy for people to "Modify" the API (when it's decided > upon) to meet the needs of mozilla? Yes, that's fine. I wouldn't be putting IronDoc in the public domain if I intended to control how folks use or change it. I won't be happy when I'm pressured to use any new and improved version because it has become the "standard", but I can't see how to stop that from happening. (I'd be *very* annoyed to see my name completely stripped, though. :-) > I know these have probably been answered, and I'll admit to not having > waded through all the posts, but I think these 3 questions are > important ones that everyone would like to know the answers to. Actually these have not been addressed yet, so these are cutting edge questions. :-) Additionally, I'd like to answer here a private question I was asked earlier about what I hope to accomplish with these postings. I want to keep IronDoc's public domain status clear, because I want to continue using it myself professionally and privately for the next 20 years without any interference. I want to finish IronDoc by the end of this summer and not have any outstanding ambiguity about ownership. I also want to use IronDoc for Netscape's address books, so I can take pride in my work and feel a sense of accomplishment for a job well done. And it would also make my work much easier to have a database that I can easily modify to support new features effortlessly and efficiently. I want address books to be able to contain anything at all, and not have it be a major architectural problem to accomodate new content or uses. David McCusker, structured storage major with minor in dynamic languages Values have meaning only against the context of a set of relationships.

reusable

From: David McCusker <davidmc@netscape.com> Subject: Re: Reusable db? Date: 03 Feb 1999 00:00:00 GMT Message-ID: <36B8EEF6.53F9E7AE@netscape.com> Content-Transfer-Encoding: 7bit To: Andrew Wooldridge <andreww@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Another Netscape Collabra Server User Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news This is fun to talk about, but I probably can't continue a long thread this week that is not very directly related to the work we need for mail/news in Mozilla. We can have a long talk if I do it slowly and amortize that time over a longer period of development. Andrew Wooldridge wrote: [ snip ] > Here are some examples of cases where I would like to reuse the db > format: 1) Create a new structure assessible via a messenger like > application that focuses on a "person". That's very similar to the function of address books, for which MDB and Mork will also be used in Messenger. Basically one can make a row corresponding to a person, and put person attributes into as many different columns using as many different data types as you like. I'd expect some RDF code above MDB would end up doing something close to what you suggest, even if this is not directly represented in Mork within a Messenger context. But you'd be able to use Mork and MDB for that in another app if you wanted. > Instead of emailing joe@netscape.com, you simply type in the person's > name and you get a screen that has: [ correspondent's other email, > home page, query template, photo, live connection data stream, etc. ] > (Basically build some kind of leads app that focuses on people as a > "pointof contact") That's very doable. If performance is perceived in terms of smallest latency (wall clock time delay between request and seeing the results) to show person data after one types a name, then this implies one of two things: 1) one uses btree indexes to lookup persons, or 2) all data to search is already in memory, so disk latency is no limiting factor. I'd expect one wants to use an MDB implementation that actually uses persistent btree indexes for small latency, since one could easily have so many large and related db files that one would not want to always have everything in the universe loaded in memory. But that's a runtime design choice that can be made separate from using MDB. > 2) Create a savable structure that is a "site dom" meaning a meta DOM > that has - among other things - child nodes that point to the top > document nodes of each page in a site. I want to be able to leverage > the DOM to create a website building tool and be able to walk the DOM > to make global changes. So I want a way to save "state" That all seems reasonable. It sounds perfectly feasible to build a doc that models the structure of a web site stored in separate pages found elsewhere. And such a doc can be encoded either in XML or Mork, or any other complete encoding scheme. And you could access either format at runtime through a DOM, or more directly through an MDB interface. But if the site dom is stored in a separate file from the other files that contain web pages, then you have a problem with transacting cross file changes, and in keeping users from breaking the DOM by moving a file in the file system using another program besides the DOM. If I were doing what you are thinking (and I will be someday when I use public domain IronDoc and Mithril to write micro web servers and integrated development environments for web site construction) then I would use a db like IronDoc (which is still vaporware today) in order to save both your DOM model and the entire web site in a single file, since IronDoc is isomorphic to a file system. Then you'd be able to transact changes on your web pages and your modeling system together. It's not clear to me what you want to leverage in a DOM to solve this problem more easily or more effectively, unless it is just the ability to use very high level APIs with which to express your content model. > > [ MDB & Mork both encode any content, with different abstraction ] > > Could you do some kind of CVS record using this? (Also thinking of my > web building app) I don't understand. (And that always provokes guessing on my part.) Are you specifically interested in a problem with using CVS, or do you want to design a version control management system for your web site modeling app, and you think of CVS as the canonical model for this? In db lingo, the different branches one can bind to in a tree of version revisions are sometimes individually called "long term transactions", since one can make them self-consistent and select a binding on demand. There's a lot of ways to do version control, and CVS seems like it works best with a file system, for a world modeled after the CVS convention of using the file system as the end-all-be-all local database for source code development. It might not fit other kinds of db systems you use. The MDB and Mork interfaces have no direct version control semantics, so you would need something like CVS. If you were using something like IronDoc's version control system to make draft trees of object networks, then you'd be able to use the version control inherent in the database. (Note this is the same design as drafts in OpenDoc style documents.) David McCusker, speaking only for myself, mozilla mail/news client eng Values have meaning only against the context of a set of relationships.

ideal

From: David McCusker <davidmc@netscape.com> Subject: [mork] comparing with the ideal writing cost case Date: 04 Jun 1999 00:00:00 GMT Message-ID: <37585CE3.B689DA1F@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news This is the second post I promised two days ago, where yesterday's first post was about incremental writing costs in Mork. This post compares the writing costs for an in-memory db like Mork to the ideal writing costs of a random access disk-based db like (vaporware) pd IronDoc. (Note I will feel free to use the short name "Fe" for IronDoc below.) The approach I describe below should be ideal except for fragmentation associated with block granularity used by a db for i/o performance. Folks should consider this the feasible lower bound on commit costs, and keep this in mind when aiming for more efficient db arrangements. Actually most of this material does not really describe the IronDoc database per se, since all the interesting parts are done by the file used by the db for i/o (but where Fe defines the file semantics). This same approach can be used for efficient updating transactions in files that do not purport to contain a database, but the overall costs are least when file block and db block granularity exactly correspond. Let's assume a db uses a fixed number of blocks to buffer file content, so that memory footprint is constant, no matter how large the file gets. (Or an app can vary block count dynamically to effect RAM footprint policies in any manner it pleases, perhaps varying by memory pressure.) The important idea here is that one of the costs, RAM footprint, need not change at all as a function of the content size or size of changes during a transaction. So the incremental space cost can be very near zero when making changes to a db, and you can't beat zero as a cost. The rest of this material describes file and not db semantics. The db doesn't need to know a file is doing any of this, so the db can just go about it's business thinking it is writing content in-place while it ignores all transaction issues, while the file handles all the details. Simplicity is one main advantage of the approach below; it's easy to understand, and hard to code or maintain wrong through confusion. There's one more file feature which is less obvious, but necessary to make the system below work: the file has to manage free space instead of the db, so the file can use free space for transaction shadowing. A db never allocates space, and has to ask the file for space instead. The verb "to shadow" in this db context means to keep both before and after copies of changed content, so one can either go forward or roll backward until a transaction is finally committed. In practice this means any newer changed copy hides the old copy in the same sense that variables in a smaller language scope will shadow variables elsewhere. Note that shadowing blocks to implement a transaction implies that increasing disk (not RAM) footprint is one of the space costs incurred by such transactions. But the cost is proportional to changed blocks, and this is about as good as any other strategy will get. At any given time, the content of a db occupies some subset of blocks in a file, and the other file blocks are free space (or used for something else, like the file's own bookkeeping which is not visible to the db). Now we're ready to describe a simple system for doing db transactions. The db can change the file three ways: delete a block, allocate a block, or write on a block. And each of these has an effect on the file's free space representation which handles all the problems that need solving. Until a transaction commits, the file never modifies an old block which was being used (i.e. not in the free space). The old map of free space also lives in old blocks, and this is also not changed until a commit is finalized. All changes to a file will happen only in what the old map thinks is free space, whose free block contents will never matter. Aborting a transaction, say by crashing, could not be simpler since all the old blocks are exactly as they were before, with no changes at all except in blocks within free space whose contents are irrelevant. So file changes happen only in blocks that the old map thinks is free space. As this happens, a new free space map is incrementally built to account for allocations to free space, and pending deallocations that will return blocks to free space when the next transaction commits. Naturally, the most significant thing that happens when a transaction commits is that the file changes the pointer to the root of the free space map, so the new one takes effect and obsoletes the old one. So a commit becomes final when a very small piece of the file is changed to point to roots corresponding to the new world view. When a db frees a block, the file notes this in the new free space map. But this block must still never be modified until the transaction ends, so the block must NOT be allocated. This implies the file must compare the old and free space maps when allocating blocks, to avoid blocks now free that used to be in use. (Yes, that's the hairiest part in this.) When a db allocates a block, it asks the file for suitable space, which is found by looking for free blocks in the new map which are ALSO free in the old map (because writing any old blocks is doom, even if free). When a db writes on a block, the file looks in a block relocation map. If not found, this is the first time the block has been written during this transaction. So the file allocates a new block from free space to hold the new copy which shadows the old block, after the file puts an entry in the block relocation map that associates old->new for all db accesses to that block in either read or write calls. So the first time a block is written, it gets relocated to a new spot in the file. But the db never sees this and continues using the old block position for reads and writes, while the file uses the relocation map to access content in the physical model that underlies the logical model being used by the db. Every changed block has two copies, with an unchanged original in the old world view, and relocated changed copy in the new world view. The hash table cost is trivial in terms of space and time, because O(1) access time is instant, and a table entry is trivial space compared to the space cost of the relocated block in the disk image. An app level commit ends up telling the file to commit, instead of the db, and it is only necessary to flush the db first so the file has all the changes that are coming down the pipe. When the file commits, the first thing it needs to do is write the map which relocates old to new blocks, so it becomes persistent and usable even in the event of interruptions from crashes. Once this is done, it is feasible to kiss the old blocks goodbye and commit to using the new copies henceforward. This cost is proportional to blocks changed. With the relocation map on disk, the file switches the top node of the content tree containing roots for free space, etc., and then the commit is engraved in stone. But the total cost of the commit has not yet been paid because the changed blocks are not in canonical positions. We want to apply the relocation map, and clobber all old blocks with new copies, until every block is phsyically located in its logical position. All the "new" blocks go into free space after they are thus transferred. This system tends to make a file have free space after a transaction that is at least as large as all the changes to old blocks in the db. This can be a lot after huge changes, approaching 50% file wastage. Now let's summarize why this system represents the ideal cost scenario. First, RAM footprint need not increase during changes to the db, so an increase in memory footprint is a cost an app need not pay. (However, an app can increase time performance by using more space anyway, to duck the necessity of flushing changes for page re-use in the cache.) All the other space and time costs are proportional to the number of blocks that get changed, so the costs are pay-as-you-go, and less when you change little and more when you change a lot. And even if you change every last block, the costs are still moderate. No extra cost is paid before a commit, and the commit writes very few blocks more than the ones which must be moved to restore canonical positions. The time latency for a commit is then proportional to content changed, with a very continuous cost function, without having to pay high costs anywhere else to get this kind of desired latency. There is no worst case scenario which spends more effort than strictly necessary. David McCusker, mild mannered software reporter Values have meaning only against the context of a set of relationships.

chrome

From: David McCusker <davidmc@netscape.com> Subject: Re: Compiling Chrome Date: 15 Jun 1999 00:00:00 GMT Message-ID: <3766B64A.9026A335@netscape.com> Content-Transfer-Encoding: 7bit References: <3765CD5B.692B7A13@wserv.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.performance Jordan Mendelson wrote: > Alright, since these text documents will rarely change, how about > compiling them into a binary file optimal for loading directly into > the web browser? > > Actually I was thinking that doing the same thing for the cached > html/xml/css files might work as well. The idea of "pre-compiling" content is a very useful idea, and one can follow a lot of different avenues when exploring applications. I won't survey as many as I can think of just now, and instead I'll focus on just one issue of reducing accesses to different files. But before I do that, first I'll try to loosen up the meaning of the word "compiling", so that folks will think of a broader category of possibilities when considering this vein of optimizations. I tend to think of compilers as abstract binding machines, which take content that could be interpreted, and change it into some other form suitable for executing in another context, where one principle goal is to pre-bind and otherwise optimize bindings for runtime performance. Naturally, for compilers to native code, one first order optimization is binding to binary encoding of assembler as close as possible to the language used by the machine or virtual machine that executes code. But the principal job of a compiler is to model the reference graph and generate code that targets the reference instrastructure that will exist at runtime, so that entity references of every kind (such as function calls) will actually bind to suitable things at runtime. (It's not enough to generate plausible machine code; it has to actually fit into the runtime context that will exist when the code is called.) Okay, you know all this, and I've already bored you out of your mind. Let's say we have a bunch of files containing content in xul, html, or whatever, and these will be consumed by the application at runtime, where the act of fitting the content in the app's model is basically a small or large number of binding operations. Instead of performing the most general case parsing and binding at runtime, this content can be pre-parsed and pre-bound beforehand; perhaps this happened the last time the app executed, which implies a first launch goes slower than later ones. (Or installation can do this also, to make first launch faster, yada yada yada.) However, note that this tends to make our code footprint bigger than otherwise, since we'll then have code for both the general case info consumption, along with a code path to load any pre-bound content. (I'm slowly focusing in on the issue of reducing file accesses.) When content is pre-parsed/pre-bound/pre-compiled (pick your favorite term), the more efficient output need not be stored in separate files on a one-to-one basis. The app can instead combine content from many source files inside one consolidated db of pre-compiled content, so using it involves opening just one file instead of all contributors. Notice this situation is exactly analogous to building a source tree in C++ into libraries that consolidate output from many source files, rather than opening and interpreting each source file at runtime. But we don't want to preclude any opportunity to dynamically change a xul, html, etc. source file whenever we want, and have this override the pre-compiled version. So we'd like to notice when source files change, although we'd like to avoid looking at every single one when we launch since that would obviate some of the optimization we gained by not trying to access them in the file system. Still, looking at mod times would be some improvement. Even better, though, would be using some explicit technique to submit a changed source file, so we tend to only pre-compile when this is done, and avoid checking mod time for all the files when we launch. For example, suppose Mozilla supports the idea of dragging a source file in xul format onto a Mozilla library/resource-file/db/whatever, and this causes the xul file to be "installed" by performing the expected pre-compilation step. (Let's ignore any interesting ideas about identifying xul content with some module ID that is independent of file path, so the same "module" could come from different files.) I'm going to just stop abruptly here rather than developing that idea any more, since the rest just evolves naturally by asking questions. David Mc

pointers

From: David McCusker <davidmc@netscape.com> Subject: Pointers considered harmful Date: 16 Jul 1999 00:00:00 GMT Message-ID: <378FCC75.E91C8B81@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.performance,netscape.public.mozilla.rdf Pointers considered harmful Here's the executive summary: mozilla will only have high performance as we scale up the amount of code, and the amount of content handled, if we make granularity of bindings smaller in both code and content so the moral equivalent of demand paging works efficiently for both. This means using address pointers as the main ID mechanism has to go. I know this seems controversial, or totally mad if you're conservative, but I'm trying to fill the vision vacuum I see. We're big on hindbrain tactics, but very low on forebrain strategy, so we need a goosing. I have been advised to use fewer notes, and avoid three hour operas, so this will be shorter than usual. But I will respond to my own post to expand on the details in other messages. You are highly encouraged to heckle me as much as you can, since you can't possibly hurt me, and the dialectic will do wonders to improve clarity of ideas. It's no prerequisite, but it would help you to read about fragmentation in thread with subject "Optimizing with nonzero fragmentation" 14July99, at news://news.mozilla.org/378D0EC7.4962CE97@netscape.com. You need to grasp an elementary but counterintuitive notion that increasing apparent overhead through granular indirection improves performance dramatically when the result applies your available resources much more efficiently. That's a less-is-more strategy. We say we want that, but we don't do it in practice, because we like to pile our tactics higher and deeper, and we end up with more-is-less instead. The tactic we are currently piling deeper is called "object-oriented programming", and it lulls you into writing more and more classes, and then instantiating more and more objects in memory. Instant mushroom, when all this goes straight to the memory footprint bottom line. Note oo programming is not bad, it's just not enough, because you must step outside and have a strategy for applying oo as a tactic. There is a larger context I call "binding oriented" programming, in which oo techniques are a pattern that is sometimes useful. Now, the main strategy for high performance is always going to involve exploiting fractal locality access patterns, sometimes called the 80/20 rule, which loosely stated means you spend 80 percent of your time in only twenty percent of code or data, and this tends to happen at many different levels of viewpoint granularity (hence the fractal aspect). (Or pick other numbers besides 80 and 20, because it doesn't matter as long as the distinct majority vs minority relationship is maintained.) So why are pointers considered harmful? Because a code graph or content graph which uses address pointers to identify nodes will tend to have a logical RAM footprint proportional to all the code and all the content. It stinks because this linear space use is O(N), and typically exploits no locality patterns, so it just loses if you want a strategy to exploit fractal locality. Just about all computing systems are composed entirely of code or data graphs, where nodes are code blocks or state, and arcs are node identity refs that bind to the designated nodes when dereferenced. Coders get good at doing this in memory, using address pointers, and this can blind them to alternative ID ref forms (especially when pride is involved :-). However, the main reason we prefer address pointers for identity refs, by a large margin over any other form of identity, is because it is so incredibly cheap when dereference mapping is built directly into chip hardware, which binds pointers to memory locations so efficiently that we consider this state of affairs a law of nature beyond further notice. This is where you need to apply that counterintuitive notion mentioned above (near the ref to my fragmentation posting), because we need to use some other kind of ID instead of address pointers for performance that scales, even though it seems painful to give up a mechanism that seems cheap as water in favor of something that seems (shudder) slower. At this point I'm going to let the movie suddenly grind to a halt, and give you the opportunity to guess where this goes next, which shouldn't be too hard since I've been telegraphing the plot for some time now. :-) David McCusker Values have meaning only against the context of a set of relationships.

mork

From: David McCusker <davidmc@netscape.com> Subject: Re: idea: compression of local databases Date: 27 Jul 1999 00:00:00 GMT Message-ID: <379E5982.90A2A102@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news John Gardiner Myers wrote: > David McCusker wrote: > > Note bad performance can conceivably result from some complex file > > streams, under some MDB implementations, when the file cannot seek > > efficiently and the MDB code wants to do so rather heavily. > > It depends on whether the MDB implementation needs to seek to byte > offsets or whether it merely needs to seek to cookies obtained from a > tell() operation. If the latter, then the file implementation of the > tell() operation can save a copy of the compression state for use by > a subsequent seek() operation on the same opened stream. Ah, that seems an excellent point (and distracts me with some ideas that don't seem relevant right here, about binding systems). Mork will only seek a cookie returned from tell(). However, random access binary db's like public domain IronDoc (and every other one that I am familiar with) will seek file positions that were never returned from tell() in the current session, and possibly never earlier either. For example, IronDoc uses a file as a virtual address space, where the file position of a block is the address of the block, and this is the basis of all graphs of blocks in a db. So blocks will point to other blocks by means of integer file positions, and these will have been written in earlier sessions, and maybe never seen before in a current session until a read() call just now. Many other db's are similar. I guess almost always a seek will target a place that was either the position targeted by an earlier write() (maybe in an ealier session), or the cookie returned from tell(). So if the same file implementation was used in earlier sessions, it could react towards all writes as if they implied "readers might want to seek() here" later, and this could leave a persistent trail similar to a runtime's tell() state snapshot. But now that I think about it more, if a compression algorithm could not always compress a block down to a predictable size, then it would not know where to look for a compressed block on disk, and this could frustrate the intended O(1) disk access performance intended by random access binary db's. So a compressing file implementation might be likely to fragment (ie. waste) space for a resulting performance wash. This is just telling me we might not want to compress random access binary db's, but we could still use compression for linear text files. I need to think of some term to use in code APIs to convey a preference by a db for a file that will not play around with file positions, so a binary random access db can specify a pref for a file like this. There's another way to look at this compression issue, too. Although binary db blocks might not be compressed, we will still write variable length blobs into binary db files, so the db is acting like a file system (in fact, IronDoc is more or less a file system in addition to some other things), and these blobs might be usefully compressed. So maybe the db needs access to a stream transformation API to apply upon embedded streams inside a db. The db infrastructure looks like clear and uncompressed markup that surrounds and organizes streams that we might like to have compressed or otherwise transformed. But at that point, maybe a particular MDB implementation with a binary db might just be using compression under the covers without any plugin. That's because the algorithm for compression might like to use the db features to store out-of-band dictionaries used to decompress streams. (Yes, I sometimes brainstorm changes in binding design by factoring bindings around at random and seeing what I think of the results.) > > I could make up an interface to convey this info between MDB and a > > file implementation, but it would tend to be more complex and rather > > harder to document than all the rest of the existing file API. > > A "readers are likely to want to seek() here" operation in the writing > API might suffice. For the DEFLATE compression algorithm, this would > cause it to finish the current block and start a new one. It may even > write a record storing the amount of data previously compressed. I think this would work very well for Mork, and for other linear format db's that are generally scanned in a mostly forward direction. > > This is interesting and brings up some fun questions about how to > > bind suitable file implementations for files found on disk. > > I was presuming the user would install a particular file implementation, > then that file implementation would decide how to store things on disk. Okay, then I only have a wrinkle on that. We'd want to install a file implementation that binds to a file name (or not) based on some kind of rule, so implementations good for Mork would bind to Mork files, but files that would poorly for a binary RAD (random access db) would not. Maybe installed file implementations would be arranged in a sequence so each gets a shot at a file first, and after passing, those after in sequence get their shots at matching rules and binding at runtime. (Just thinking out loud.) > > Some people hate file name extensions for typing, and we are using > > ".msf" for Mork summary files currently, so altering this would > > likely cause problems. > > I was presuming the open() method of the file implementation would map > the names appropriately. If called with a filename of > "/foo/bar/Inbox.msf", an implementation could open the underlying file > "/foo/bar/Inbox.msf.gz". The factory for some other file implementation > could prompt the user to insert their mail-storage smartcard, then map > the file to "/mnt/smartcard-347F83C4/foo/bar/Inbox.msf". > > (I suspect doing UI from deep within MDB is impractical, but doing UI > when Messenger is starting up may be reasonable.) That all sounds feasible. We'd use ".msf" for Mork databases, and we'd know we were doing that when going through the factory for Mork under MDB. Another db under MDB would have a different factory, and we'd have an opportunity to use some other convention for naming, and then we'd be able to map those differently. So a file name ending in ".msf" would mean "use the Mork factory, but you can map the file location around where you want, and use compression if you want to do so, since that works file with Mork text format dbs." David Mc

mdb

From: David McCusker <davidmc@netscape.com> Subject: Re: database decision Date: 16 Aug 1999 00:00:00 GMT Message-ID: <37B87036.C6D03E40@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.netlib Brendan Eich wrote: > I had assumed some familiarity with how we used Berkeley db 1.85 > in the past for the cache: No familiarity on my part; I think caching is trivial, so I would never bother to look at historical implementations. (The difference between what is obvious to me and other folks causes a never-ending gulf of misunderstanding; I space walk without safety lines.) > We used the external hashtable part of db (no btrees), and only as an > index to find local files containing the downloaded content of URIs. It was obvious we were using local files, and that the "index" was a mapping of URI to local file. That the map was a hash table and not a btree is really immaterial, though it makes sense when sorting is not needed. However, btrees have almost no difference in performance compared to hash tables, on disk-based data structures, so the choice is not very signficant in this case. Also, in this case, using Berkeley db 1.85 at all for any purpose as an index is overkill, since the data set is small enough that keeping it entirely in memory is both reasonable and efficient. If db 1.85 also had an integral blob system for use in filing content in arbitrary length streams, like public domain IronDoc is designed to do (in the same file as any associated indexes), then it would not be overkill to use a medium weight database like this. > So the index mapped URI to local filename. Local filenames were in > fact MD5 hashes of the URIs, to avoid collisions when storing files > in a single directory on a native filesystem that might have harsh > name length limits, and secondarily to obfuscate each cache file's > URI-identity against trivial attack or just over-the-shoulder > observation. The part about using MD5 hashes for file names is less obvious, and is kind of interesting as an example of good design using the right tools. However, the need to obfuscate URI identity, and to avoid name length problems, are both caused by the ill-considered choice of using the file system as the cache. So the clever MD5 scheme is just a catchup measure to fix some drawbacks of choosing a low performance solution. Using the file system is not a really bad choice (a 5 on a scale from 1 to 10), but it's not an especially good choice, since it doesn't scale attractively in either volume or transactions per time unit. But file systems are the most common default standby database choice. > I believe the index also carried timestamps by which one could order > the cache on startup or recovery to do LRU replacement. Annotation is cool; but presumably the set of permitted annotations is fixed by a binary schema used by the index db in this case. That's not necessary. (The limitation is not caused by the binary nature of a db; any db can arrange to be schema-less if it is willing to be so.) > If mork (via MDB of course!) could be used for the cache index, great! That would indeed work. And Mork's biggest weakness, the need to keep everything in-memory during use, has virtually no drawback when the usage involves data sets like the cache which are well suited to that. > Storing downloaded files each in a local file has caused the Mac > stress. Any thoughts from readers of this thread on using fewer files > to store multiple downloaded files? Yes, I have lots of ideas about that. The entire second half of my news://news.mozilla.org/37B4BD0C.46F145AA@netscape.com posting was devoted to this topic. (I'm trying not to scream. :-) So here it is again in a single sentence: You can use one single file as a file system. That sentence is the one about which I have spent much of my thousands of db design and coding hours thinking about, and is the basis of why public domain IronDoc uses a single file a universe to contain both aribtrary numbers of btree indexes, and arbitrary numers of independent file system trees. Note that it goes against the grain of one-true-way mysticism, because it assumes there is a reason to layer a file system on top of another OS file system, in any platform environment. There are many reasons, no matter how nice the local file system for a given operating system. Most problems involving storing bits of data quickly and efficiently are obvious to me, and there is zero chance someone will suggest any solution with which I am unfamiliar, at least in general outline. This stuff is so easy to me, it is actually acutely boring, and it's like a dentist drill to my teeth that folks don't grasp the basics. I understand you have some familiarity with the Iris file system, so you probably understand what I am saying when I claim I can write a file system in a file for you, using tree structures resembling Unix style inodes. That I make it sound trivial (which it is), might sound like bravado to you, but I really don't care. The solution is obvious. Put many logical files in one physical file. Do it generally, and you have a file system. It need not be anywhere near as complex as an operating system's file system, because it need not solve many, many problems that an OS is responsible for handling. And in the case of a cache which you need not even protect with good transactions, because it is a disposable optimization, you have the permission to forsake special care to avoid data loss during usage. Designing such a simple file system from first principles is easy; if you are willing to read it, I'll describe it in as much detail as you want. Under the assumption you want a taste, here's a few details, but using plausible answers to the first principle questions. Start with a physical medium like a file system file, and use this in which to store your logical files. Call this physical file a 'device', just to keep separate the notion of physical medium (called a device) from the logical content, such as files. Divide the physical device into uniform sized blocks that contain 2^N bytes each, for some power of two. Use the first block of the device as a header describing what you need to know about the device. Use the second block as a bitmap that shows whether any block in the device has been allocated for use. Attach a page cache to the device, containing some number of buffers that can hold 2^N bytes each, so each buffer can page a device block into memory. Page caches are boring, so I'll skip further details, but just about all i/o to the device goes through this device pager. Create a new file in this device. This is done by allocating some free block in the device, and this block is used as the root of the file's tree of content blocks. (Whether the root block is all index, or whether it contains content when a file is small, is all a matter of design choice involving trivial and boring mechanisms; but obviously a file smaller than a block should fit into this root block.) When all the logical file content cannot fit into the root block, then at least some (maybe all) of the root block must contain pointers to other blocks, where in this context 'pointer' is defined to mean the offset of any device block. So block pointer means block position. There are many ways to build the tree. The root block could hold a variant of a Unix inode structure, and the resulting tree would be very easy to build. Or the root block could be something from the IronDoc design, so you could get Exodus style btrees; but that would be complex, and therefore stupid, since it would be easier to use IronDoc than to do it all again (though folks are welcome to try). The root block for a file could be used to encode an awful lot of different kinds of information. It might hold annotations, including URIs or MD5 hashes; it might hold inodes; it might hold all or some of the file content (say the last odd sub-block file fragment). The format of root blocks can soak up lots of your time and attention. Note this is partly caused by the assumption in this model that each logical file needs an entire block for the file root; that is not true in a more sophisticated design, but we're trying to stay simple. Every time a file needs an index block or a content block, it just allocates another device block from the device. Each block allocated gets stitched into the overall file tree structure, but putting the block's position as a pointer into the appropriate parent block. When a file is deleted, you just do a traversal of the tree and deallocate every block in the file's content tree, so the device marks them free. How do we access a file in this file system? What is the identity of any given file? Where is the catalog that maps file names to files, and other things that we associated with conventional file systems. The identity of a file is the block position of the root block. You can access any file provided you can find its root. Someplace you'll want an index that keeps track of all files on the device. How this is done is not especially interesting, and there are a huge number of options. (The best choice is the one IronDoc uses, where the device also contains any number of btrees for indexing whatever you want.) Let's say we use MDB to keep an index on the side that maps URIs and MD5 hashes and whatnot to each file's root block position. Such a map let's us annotation each logical file, as well as keep track of the root block, which gives us the access to content bytes. That's enough for now. Please feel free to ask questions. Also feel free to try stumping me if you think you can do so. (I need to work off some of the pressure building up from frustration that all this is not obviously grammar school level storage tech to everybody.) David Mc

choice

From: David McCusker <davidmc@netscape.com> Subject: Re: database decision Date: 17 Aug 1999 00:00:00 GMT Message-ID: <37B9D410.7236CDC@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.netlib Frank Hecker wrote: > As a non-programmer I normally lurk on these groups (which I read to > keep up with Mozilla development, as well as just for pleasure), but in > this case I was moved to post two quick comments: You always sound pretty dang technical to me; reasoning tops coding. :-) > 1. I sympathize with your frustration here; I'm far from being an expert > in this area, but you're right, it really is obvious that a single file > can be used as a file system to contain multiple cached web pages, > images, and related data. There are also many real-life examples of > this general idea; to take but one, PC/Windows emulation software (and > similar products like VMWare) for Linux and Unix typically support the > use a single Linux/Unix file to store a entire DOS file system. Yes, and other examples of file-systems-in-files are various flavors of structured storage systems, such Microsoft's DocFile, Apple's Bento, the Quilt system I wrote at Apple to replace Bento, and my public domain IronDoc structured storage system (vaporware, incomplete, on hiatus). I don't know anybody else who's also written more than two of these. You could say the combination of my Mithril language and IronDoc file system is actually a public domain, portable operating system platform. And I understand perfectly the implications of such a thing, but I have no desire at all to joust with folks about any preferred technologies. There are many reasons why IronDoc is not being used by Netscape now. The most important is that I have steadily maintained a strict public domain stance, which causes it to drop off the radar of interest. The second most important is that folks have been very fond of saying we don't have time to write our own db, and thus I have ended up writing the lightweight Mork text db for Netscape instead (sigh). Another reason is that folks think of older db's when they want a middeweight solution, and they rally round the flag of familiar ground. A final and most common reason is that folks with short attention spans think that since I haven't finished it yet, that proves the lack of worth. (If I said I was actually done, though, things might be different. But even if I had resumed in May when it was first really feasible to do so, I still might not be done by now at a pace of only a day per weekend.) > 2. Users of browsers occasionally change the amount of disk space they > want to devote to cached data; in your case that corresponds to changing > the size of the "file system in a file" containing the cached data. To > avoid unnecessary copying of previously cached data it would presumably > be desirable to grow or shrink the "cache file system" in place (i.e., > extending or truncating the existing file containing cached data). Yes, and this is actually something complex to design into a db when you have stronger requirements than "I don't care" for performance effects. > I know this is possible (you might say "trivial" :-) because existing > file system implementations like Veritas do essentially the same thing; > however for the sake of completeness you might want to address this > issue in just a little more detail. Getting larger is trivial; getting smaller implies either sophisticated reasoning about garbage collection and possible optimizations, or else throwing up your hands and pretending it's not a problem. I don't mind adding detail for completeness. (I'm always stuck between two problems: if I say too much folks won't read it, and if I say too little someone might assume I don't know about something since I didn't mention it.) In the case of a cache though, one might just purge the entire cache when it gets shrunk, since no runtime content invariants are violated by discarding all the content. But I suppose as soon as one starts talking about caching for offline, one gets some additional constraints that give obligations more like a db that can't afford to toss stuff. > The case of growing the cache file system seems exactly analogous to > what a file system implementation like Veritas does: add new free > space and adjust whatever else in the file system needs to be adjusted > to use that space. Yes, that one's easy because it only extends free space, and nothing has to move. In the case of a db which must autogrow to hold new content, it is best to first ensure a db file has enough free space before even trying to add new content. That way you find out immediately whether disk space will be exhausted without having modified the file in such a way that recovery can only be effected by a transaction abort. > The Veritas and similar file system implementations can also shrink a > file system to a given size, assuming that the total amount of space > occupied by existing files in the file system is less than or equal > to the desired new file system size; then after defragmentation no files > reside beyond the new upper limit, the file system data structures can > be adjusted to reflect the new size, and the excess disk space can be > reclaimed. In the case of the cache file system the implementation > would simply delete an appropriate amount of cached data prior to > shrinking the cache file system. Difficulty in shrinking is highly affected by the permitted distribution of locations in a db that can reference things that can be moved. When references might be almost anywhere as in IronDoc (because it uses copy- on-write to make block granularity diff trees for draft management), then something closer to a full scale garbage collection is required to patch any persistent references that need to change. The primitive file system I describe earlier in this thread would be a bit easier to shrink in place, because references to a block would have a very regular organization, with exactly one parent per block child. Still, moving blocks from the end of the file forward into empty slots would require finding parent refs, which would be hard (linear expense) if children did not have backpointers, and usually one wants to put only content bytes in leaf blocks. But it's very tractable as an algorithm. I think that kind of garbage collection would be pretty easy, but even so I'd expect folks to favor ditching the entire cache, just to save code bytes it would take to do the gc, no matter how small the code. I don't actually know a thing about the Veritas system. I am not an expert in the sense of having familiarity with extant implementations. This is basically why I don't pitch myself as a db expert, since I just don't care about existing practice when it's so easy to reason out. Hey, it just occurred to me you might want to advocate Veritas as one possible solution (not knowing anything about their licensing). Just so you know, I don't have any agenda, so I won't try to block you. > I may be misremembering this, but I thought the Veritas implementation > could actually do the defragmentation in the background over a period > of time and then shrink the file system at whatever future time it > happened to complete. The subjects of space fragmentation and optimization for locality of reference are ones I usually keep behind a dam, rather than letting my- self spew every technique that has occurred to me before. It's more art than science, and one can get very heuristic and therefore very complex and detailed in tactics, even though the strategic goals change little. Every time file content changes, or a read or write is done, there is some opportunity to try a tactic to improve locality of reference. It's obvious when changing file content, you'd want to try storing things contiguously. (And if you don't mind modifying the file without any explicit request, you might rearrange things in the background.) Games to play might be less obvious when, say, reading file content. For example, when one wants more than one block and they happen to be adjacent, you can read them all at once. Or if block granularity is smaller than seems efficient for i/o bandwidth, then a device pager can read multi-block sectoins and then do two-layer caching, where the page cache is one layer and the largish i/o buffer is another. Similarly, when flushing a page cache, it would make sense to sort dirty pages and then note when adjacent pages will be written, so they can be consolidated into a single disk write, since a couple memory copies is much cheaper than issuing another disk access. > (Of course, it would also have to immediately enforce the new file > system size limit, lest users use up enough file system space in the > meantime to prevent reaching the desired target size.) Something > like this might be desirable for a browser cache file system > implementation as well, so that users wouldn't have to endure an > extended wait after reducing the cache size. You can just enforce the new logical eof immediately for a cache since it's a hard limit, and only physically shrink the file when it actually gets cleared by code moving stuff around. If I wanted a similar low time latency when shrinking a db file while it was still being used, I'd probably use a technique very similar to a transaction shadowing tactic. In the file trying to shrink, you can enforce a hard logical eof limit. If adding content would go past this point, you can put all overflow in an extension file that maps as if contiguous at the end of the file being shrunk. Then when the shrink commits, append all the new content in the temp overflow/extension file to the original. But with our browser, I'd expect changes in cache size to happen so infrequently, that it would be acceptable to purge the entire cache of content when making it smaller. David Mc

brainstorm

From: David McCusker <davidmc@netscape.com> Subject: [performance] brainstorming with waterson Date: 01 Sep 1999 00:00:00 GMT Message-ID: <37CD8E0A.65716CEB@netscape.com> Content-Transfer-Encoding: 7bit Organization: Ontology Mechanics Guild Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mail-news I talked more with Chris Waterson, and he's going to post some ideas somewhere when he has time. I'll maybe expand on some things he describes, when I suppose more detail might help. I hope I don't anticipate his report much here; so I won't describe the caching and hedging tactics until after he has a shot at presenting them. I was focused on the themes of caches and hedging in order to speed operations and algorithms. The caching is to avoid figuring out the answer to the same question twice (or more times) in a small period of time, such as when comparing collection members when sorting. The hedging is to use multiple representations of either structures or algorithms, and to choose the right one either based on content size or based on whether caller parameters fit expected patterns. The surprising thing during the conversation was when I realized there was actually an application of my version of Exodus style btrees for in-memory use algorithms, and not just database applications with which I'm more familiar (where I focus mainly on avoiding disk io's). Here's why. Building a content model can sometimes wish to insert all new content in sorted order (doing some kind of insertion sort), but will want to avoid N squared memory copying, so a btree is the most general kind of skip-list to abbreviate copied memory ranges. Meanwhile, a presentation of a content model might want to iterate over content in sequential order, or in random access order by array index (using a UI style sometimes called 'virtual list view' when it appears in a scrollbar controlled view). Some time back for IronDoc I invented a way to put leaf counts into btree nodes to support efficient array style access with only O(logN) navigation cost to a member given an array index position. So this kind of btree is a comfortable hybrid between sorted growing needs, and later (or coincident) random access presentation needs. It never occured to me that I might code this up as an in-memory data structure on par with data structures like hash tables, but it looks like this would be useful for applications like RDF and the layout engine that uses the data sources, just because it is both efficient at both building and displaying unpredictable heterogenous content. So both model and persentation needs would be met. So I might get around to this in IronDoc sometime, or else in Mithil since it is strictly an in-memory kind of gimick (except it might be hybridized with persistent btrees in IronDoc, hmm). But we could use it in Mozilla, so maybe somebody reading this wants to do it. This kind of btree is more complex than a hash table, so it would take rather longer to whip up. I can describe it better later. David Mc

paging

From: David McCusker <davidmc@netscape.com> Subject: Re: (not-quite) memory-mapped IO Date: 02 Sep 1999 00:00:00 GMT Message-ID: <37CF2A1C.93EDC4EE@netscape.com> Content-Transfer-Encoding: 7bit References: <37CEB95D.BDA31B83@mozilla.org> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.mac Mike Shaver wrote: > http://lxr.mozilla.org/seamonkey/source/nsprpub/pr/src/md/mac/macio.c#1851: > > * Memory-mapped files are not implementable on the Mac. > > Um, so. I have these nice patches from Jim Nance that make startup a > fair bit faster on Unix and Windows by mapping the registry file into > memory and insteads of doing seek-and-reads to get data out of the > registry. (It doesn't help the write-to-file-after-every-SetString > problem, but I have other plans for that.) As I mentioned in the bugzilla report on the registry problem, I fixed a similar problem in 4.5 mailnews code, where I essentially implemented the logical equivalent of memory mapped file io (for Mac too of course). You can find relevant 4.5 code by looking where these prefs are used: user_pref("mailnews.file_buffer.grow_pages", 16); user_pref("mailnews.file_buffer.max_pages", 512); user_pref("mailnews.file_buffer.page_size_kilo", 4); user_pref("mailnews.file_buffer.paging.never", false); user_pref("mailnews.file_buffer.start_pages", 16); > The obvious problem with this is that the Mac is not playing nice with > the other children. Should I just read the whole bloody thing into > memory? (It is typical, of course, that this solution will not work > for the platform that needs it most.) The fact that Mac needs it most is one main reason why I originally developed the technology upon which my later 4.5 fix was based. I'm not trying to tease anybody with this description of tech I'm dangling out of reach. I'm just trying to describe how it can be done in case someone wants to do so, even though I don't see how I'd have time to code this myself unless mailnews management agrees. The basic idea is something I called "paged streaming" (I put this into IronDoc and did some pretty heavy duty monte carlo testing to verify it was working correctly). The basic architecture model is multiplexing/demultiplexing. So you start with a file interface that supports a method suite much like Unix files, with read, write, seek, etc. This is the file that actually performs file i/o and causes disk access to read or write. Then you plug this basic plain vanilla file into another one which does all the dirty work of paging. This high level file builds a page cache which it uses to buffer N blocks of the file in memory, using LRU replacement when the file is bigger than all the pages put together. The page cache uses the basic vanilla file to make disk calls to read and write entire pages. Then the high level file implements exactly the same suite of file methods as the basic file, with read, write, seek, etc. And then it maps all its i/o traffic on top of the page cache. So when a client uses the high level file, they see your plain everyday file interface, just like the basic file. But at any given moment, much (or all) of the file is buffered in memory, so that most logical disk access actually goes to memory instead of the disk. Unless you flush, of course, which can't be dodged. Reading and writing the high level file has the same performance characteristics as writing to RAM which is managed with virtual memory. So if you like virtual memory, then you'll like this too. David Mc

fragment

From: David McCusker <davidmc@netscape.com> Subject: theory and some data (Re: Caching of mail/news messages) Date: 14 Sep 1999 00:00:00 GMT Message-ID: <37DEC111.D9574B06@netscape.com> Content-Transfer-Encoding: 7bit To: Scott Furman <fur@netscape.com> Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.netlib,netscape.public.mozilla.mail-news Scott Furman wrote: [ snip ] > If the underlying performance problem is that Mac disks just tend to > be fragmented (rather than that the disk cache behavior itself tends > to be the cause of the fragmentation), then using a single flat file > for the cache might not help much (because the flat file will itself > be fragmented when it is created). That is not the only underlying problem. In fact, traditionally we ascribed a bigger cause to another problem. (I used to work at Apple, but did not actually work on the file system code, so this info could be inaccurate.) It was widely believed that open time for a file on a Mac had linear time cost in terms of the number of files in a folder. The hearsay was that performance starting tanking after a folder had more than 100 files in the directory containing the file to open. This was the single largest reason why OpenDoc was slow to perform some operations, because we searched some folders for things to use, when those directories might become huge dumping grounds in real systems. It is not really accurate to say that flat file fragmentation will have equally bad effects as Mac file system fragmentation. This is because the Mac technique for dealing with this fragmentation clearly entails more disk io that it should; but maybe this was just their way of making sure the file system did not become corrupt, by avoiding memory caches. A flat file can keep the index of free space entirely in memory, as well as the structures that track how allocation is organized, and this will probably perform less disk io than the Mac file system does, even if we otherwise perform as many disk seeks due to fragmentation. However, I am pretty sure the big win is avoiding time to open and close a file that you want to access, and not avoiding file segment seeks. > Can anyone point me to experimental results which demonstrate that the > performance problem is fragmentation (and not slow reading of overly > large directories or some other aspect of the Mac file system). I have experimental evidence that might demonstrate the opposite. But the data I am about to describe might only indicate something complex about the manner in which the file system was kept in 68000 family code instead of being rewritten in PowerPC instructions. However, one can still conclude that hand rolled caching will go faster than the Mac. Last year I wrote a monte carlo stress test to verify the correct results of a page cache I wrote for IronDoc, as well as the paged streaming that later got ported into the file io for 4.5 mailnews. (You can find the code for this stress test online if you want to study it in detail; since it's public domain, I don't care what anyone does with it.) Part of the test involved doing tens of thousands of repeatable pseudo random reads and writes at random offsets and random sizes. Naturally the IronDoc cache was much faster than direct file system io. Now here's the weird part. I made a large RAM disk on my Mac, and also gave the system a lot of space for file system buffering. When I ran my test on this configuration, the plain vanilla Mac file was SLOWER. The RAM disk was slower than not using a RAM disk, so this caused me to scratch my head furiously, but I never understood exactly why. When I consider this data in the context of flat file caching, I think it shows that even when the Mac has RAM thrown at it to avoid disk io, it will still underperform relative to a handrolled caching system. David Mc

letter

From: David McCusker <davidmc@netscape.com> Subject: [Fe/Ag] Netscape February 1 IronDoc/Mithril agreement letter Date: 23 Sep 1999 00:00:00 GMT Message-ID: <37EAABC5.1B4BDF72@netscape.com> Content-Transfer-Encoding: 7bit Followup-To: netscape.public.mozilla.netlib Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com In conjunction with my announcement I'm resuming IronDoc weekend work, the material below is intended to clarify all relevant legal issues. I'm posting this only in newsgroups which might be affected later if I apply IronDoc specifically to address respective mozilla code needs. I see no reason for questions or discussion, but I can't stop you. I am under the impression this entire message will be seen as both acceptable and harmless to Netscape management and legal folks; this is just a long delayed public announcement of some old news. On February 1 this year, Netscape signed an agreement letter with me on the topic of my ongoing public domain development of the IronDoc database and the Mithril programming language/app dev environment. Let's call this the (Feb 1) FeAg letter for bevity, since Fe is the short name for IronDoc, and Ag is the short name for Mithril. (You might want to pronounce "FeAg" as "phage" if you're so inclined.) In order to be suitably discreet about business practices, I intend to say as little about this letter as possible, beyond what is needed to clarify the ongoing status of how it affects IronDoc and Mithril. I have explicit permission to announce this letter exists, and I can publish the "Attachment A" which appears verbatim further below. I'm not at liberty to disclose the letter itself, which means I can't publish it. But this does not constrain me from summarizing effects and intent, or conveying the purport of various verbal conversations that established the intended meaning of the letter. Further, I feel free to state any number of things which do NOT appear in the letter. The gist of the letter is the following. I have explicit permission to continue developing IronDoc and Mithril, and continue dedicating them both to the public domain, and Netscape will never rescind this permission for any reason. The rest is mostly conflict resolution. This last April, AOL updated both my offer letter and IP agreement to give explicit precedence to this Feb 1 FeAg agreement letter, so this understanding also binds AOL, and the situation is final and stable. (And in the future, I will never work for another company which does not also make the same guarantees regarding IronDoc and Mithril.) So how does this currently affect Netscape and Mozilla? IronDoc might be useful for solving some technical problems or requirements in the Mozilla system, but I will only develop IronDoc on my own time. (For faster progress, Netscape can only give me more time off, since I won't develop IronDoc on Netscape time without some change in understanding.) If IronDoc source is incorporated into Mozilla, it will not be by my own hand, unless the meaning of such an action is clarified. I will not accept bugs reported against IronDoc, unless this too is clarified. In short, I aim to avoid doing anything with IronDoc on Netscape's behalf. If IronDoc source is incorporated into Mozilla, it will never be subject to the Mozilla Public License, because IronDoc is public domain, with no restrictions on how it is used, or by whom, or for what purpose. So if a competitor also incorporates IronDoc in a competing system, they will be able to do so, and will not be required to publish any modifications. I assume all this is acceptable to folks, and I will avoid bringing any of this to attention so I avoid being obnoxious. From now on I will be agreeable and congenial, and keep slogging away at IronDoc on my own time so it might be available for use within reasonable Mozilla timeframes. Just make sure I am assigned no task that involves applying IronDoc. The following C comment is the IronDoc license in each source code file: /************************************************************************* This software is part of a public domain IronDoc source code distribution, and is provided on an "AS IS" basis, with all risks borne by the consumers or users of the IronDoc software. There are no warranties, guarantees, or promises about quality of any kind; and no remedies for failure exist. Permission is hereby granted to use this IronDoc software for any purpose at all, without need for written agreements, without royalty or license fees, and without fees or obligations of any other kind. Anyone can use, copy, change and distribute this software for any purpose, and nothing is required, implicitly or otherwise, in exchange for this usage. You cannot apply your own copyright to this software, but otherwise you are encouraged to enjoy the use of this software in any way you see fit. However, it would be rude to remove names of developers from the code. (IronDoc is also known by the short name "Fe" and a longer name "Ferrum", which are used interchangeably with the name IronDoc in the sources.) *************************************************************************/ The text below is the aforementioned "Attachment A" in the FeAg letter: <irondoc-mithril-letter-attachment-a distribution="public"> NETSCAPE Attachment A Netscape's statement regarding ongoing IronDoc and Mithril development by David McCusker. Copyrights Although David McCusker will be working on structured storage database projects in the context of his employment with Netscape, he is allowed to continue to dedicate the copyright in his IronDoc and Mithril source code to the public domain. This permission will not be revoked for any reason. Trade Secrets David will be following an approval procedure intended to avoid any unapproved disclosures of Netscape trade secrets. Patents Like any Netscape employee, David may create patentable inventions while employed at Netscape. Netscape reserves its rights in any such invention. Users of IronDoc code do not receive any rights under any Netscape patents. No Warranty IronDoc and Mithril are not Netscape products. Netscape PROVIDES NO WARRANTIES, EXPRESS OR IMPLIED, WITH RESPECT TO IRONDOC OR MITHRIL, INCLUDING WITHOUT LIMITATION THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. </irondoc-mithril-letter-attachment-a>

triple

From: David McCusker <davidmc@netscape.com> Subject: Re: [Fe] possible future IronDoc triple store Date: 24 Sep 1999 00:00:00 GMT Message-ID: <37EBEEED.F6742619@netscape.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Organization: Ontology Mechanics Guild Mime-Version: 1.0 Reply-To: davidmc@netscape.com Newsgroups: netscape.public.mozilla.rdf > Cool! Good to see IronDoc back on your agenda... The volcano was dormant for a long time, but recently there were some signs signaling a new eruption was imminent. The voices-in-my-head (tm) began telling me to finish more so I could teach-them-a-lesson. :-) > This one's been a long time coming -- I've been bugging you about > this since '97! ;-) Yes, this last ten month hiatus was a very long one. But my life is now much different. After hundreds of hours of cycling, I weigh 40 pounds less, and I am much more relaxed because I have many fewer goals, and this allows me to enjoy life more by taking random walks, instead of always being sharply constrained to follow the best possible strategy. My new philosophy in life says, "It doesn't matter." :-) I no longer have any need to achieve anything in particular, so I have the luxury of having total indifference to folks who would give me any grief. My kids now have a house to live in, so my worldly needs are now met. > Maybe someday we'll get to update the IronDoc FAQ on > this stuff. Currently I'm mirroring the following RDF Q'n'A in the > IronDoc site... I wonder if there's anything to add at this stage. This is strongly affected by my new philosophy, and by reduced goals. I currently have no intention to write more documentation, and let the devil take folks who can't figure out how the database works. :-) Okay that's just an exaggeration, but it was a very fun exaggeration. When I next update the web site, I will be removing a lot of material, to tidy things since many implied future docs will not be forthcoming. But I will leave in place any good information I already have, since I think destroying information is a sin. When I was recently considering a return to IronDoc work, and tried to accede to requests to update web site, I discovered I was no longer interested in communicating enough to write new material. This will likely change sometime, but right now I have writer's block for docs. However, I've had many fun ideas for dialogs between multiple virtual characters, so they can present ideas through Socratic conversation, and some of this might be appearing on the site sometime. In the course of writing this response, I thought of a better answer to your question about new material regarding RDF Q'n'A, and I talk about this below with regard to specifying a triple store interface. > I like the jaw-dropping bit... That's my main motivation for doing anything to IronDoc at this time. I'm a bit more interested in my current Mithril language development, but it will be fun to watch the eyes of folks who see IronDoc in use. As a side note, IronDoc was designed all along as the ultimate db for a dynamic language environment like Mithril, since I'm not interested in db technology for its own sake. I'm much more interested in code systems, compilers, and related things; my IronDoc work was just a huge digression to build something myself I needed, which didn't exist. Of course, after I finish both Mithril and IronDoc, it will be fun to watch folks eyes when they see what can happen in integrated envs that consolidate source code editing, incremental compiling, module binding, code linking, running, debugging, etc, all in the same system which is an independent layer depending very little on underlying native systems. Folks only have a vague idea what I can do with such tools. I'm really tired of programming with stone knives and bear skins, like C++. Those things still have their place, even in the system I'm building; but I don't want to think about them any more than really necessary. > Could you say a little bit more about the types of RDF/triplestore > query you'd be able to optimise for? I must honestly say I have not given it much thought. You've seen my postings where I was exploring a formal db model; well, I started to write a C++ interface and a pseudo-mathematical model on weekends a while ago. I can finish the C++ interface, with the model annotation, and post that so you can say whether the queries you want to perform can be achieved in terms of the low level APIs so exposed. I think I'd have to do another couple hours of writing, though, and I was going to resume the IronDoc development this weekend instead. However, I can maybe answer your question partly right now by reasoning from first principles about what is possible to do using IronDoc btrees. Using blobs with persistent IDs, one can build arbitrary graphs under IronDoc. So if there is some structure you wished was in a db, then you can make that happen. Also, you can have as many btree dicts that you want to use (and you can add them lazily on-the-fly by demand if you desire, since there are no schemas). The purpose of a dict is to keep a sorted set of keys, which associate with a (possibly empty) set of values. You can sort anything, and you get to define how the comparison works that drives the sorting. So you define the exact sort you want to have. IronDoc dicts permit you to have duplicate keys with respect to sort order, so dups are not problem. Autocomplete is extremely easy. For example, in code I tested last year, a very efficient query for a prefix string finds the first and last possible matching keys in a dict, by finding the edges of members bounded by "<" and ">". This query generates the first and last int positions of those keys in the dict, because IronDoc dicts also behave as very efficient arrays, with theoretical O(log(n)) behavior but with practical O(1) behavior since linear iteration efficiently pages leaves. Obviously all content in RDF can be represented as objects composed of attributes; you can have as many dicts as you please to sort objects by some combination of attributes you want to define. If a particular sort cannot be expressed directly in terms of object attributes, then you can build derivative data structure graphs, and then build indexes for material found there instead. The thing to keep in mind is that IronDoc is a low level db engine, and you have to drive everything yourself at what seems a raw interface level from the perspective of any developer familiar with db technology. Nothing is automated; everything is manual. The disadvantage is no magic. The advantage is you can do anything at all. It's nothing like SQL. > Something roughly akin to the Mozilla RDF APIs? I am not familiar with the Mozilla RDF APIs, and I have had no plans to look at them. Actually I'm slightly familiar, and was turned away by acute dislike of the ontology used in code names, where for example, the obfuscating term 'assert' is used to mean something like 'add'. I was thinking more that whatever C++ interface I suggested for a triple store would be sufficient to support whatever Mozilla APIs needed to do. That would imply a need for higher layers to achieve complex effects. David Mc

manchu

10nov01 Saturday @ (10nov01)
? Tonight I'm going to add more to my recent Netscape story.
? Please remember Fu and Manchu are codenames as explained.
? Fu is a vendor, and Manchu is the Fu vendor's storage engine.
? The story's not about anonymous Fu or anybody's reputation.
? Rather it's about my experience in a past ugly work situation.

balanced trees @ (10nov01)
In college I figured out how to implement btrees from scratch.
I read a vague description of principles, and took it from there.
I explored options thoroughly, and canvassed design choices.
And I built a working btree storage system for my first tech job.
Afterwards I redesigned them from scratch several more times.

Thus I know a lot about btrees. But not everything, of course.
Even so, I've told other database experts how to improve them.
If you use btrees, I've a good idea what constraints affect you.
If you screw anything up, I know what it is. You can't hide.
I came up with novel things in btrees I didn't learn elsewhere.

Most low end persistence engines are largely driven by btrees.
Therefore, if I see such an engine, I know exactly what's up.
I've expertise in low end storage engines, but I don't specialize.
Frankly it bores me now, except for my amateur works of love.
I don't want to learn enough more to become some authority.

quilt and irondoc @ (10nov01)
As I've described elsewhere, I designed both Quilt and IronDoc.
I had mostly finished Quilt at Apple before OpenDoc was killed.
Quilt was a descendent of my several earlier db engine designs.
Then I redesigned again to come up with a better IronDoc spec.
(IronDoc is partly done, but has big holes. Think: vaporware.)

A description of IronDoc figured highly in Netscape interviews.
This is why they wanted me to work on client backend storage.
They knew I planned to complete IronDoc on my time at home.
I made this clear to my hiring manager. But that sounded good.
It meant I might one day replace Manchu with an IronDoc engine.

However, vendor Fu freaked when this plan became apparent.
If nothing else, it undercut the Netscape mail client reference.
Even worse, it meant another storage competitor in the market.
And IronDoc was going to be free. And maybe better (IMHO).
Fu accused me of stealing Manchu technology to put in IronDoc.

Later I'll explain better why this accusation is totally bizarre.
But it wasn't merely that IronDoc was already a mature design.
Additionally, IronDoc was format centric and non-oo in C.
In contrast, Manchu was class framework centric in oo C++.
The designs were really different, with nothing in common.

netscape client storage @ (10nov01)
The Netscape mail/news client group hired me in late May 97.
(OpenDoc was canceled in March 97, then I had a sabbatical.)
My job was to work on the mail backend, focusing on storage.
In short, it was another C++ runtime gig, with another database.
The database was Manchu by a vendor named Fu (codenames).

The mail/news team painted themselves in corner with Manchu.
The address book in particular had a serious scaling problem.
This was partly pilot error in the way the database was used.
But it was also due to a pair of serious Manchu inefficiencies.
I fixed two of these, but not Manchu's basic btree architecture.

big enterprise money @ (10nov01)
Netscape needed me because they were selling into enterprise.
Before Microsoft gave away IE for free, Netscape made money.
A single site license to a big company was millions of dollars.
In theory, single users had to pay as well. This was unenforced.
The big money came from landing accounts in the big customers.

A customer who pays millions of dollars can make demands.
And they did. They demanded the mail/news client scale better.
In particular, the address book db needed to scale much bigger.
Companies wanted directories with over 100,000 employees.
They said, here's $2.5 million. Now make address books scale.

The Netscape 4.0 client scaled to 10,000 only, and was slow.
Also, newsgroups over a few thousand messages were slow.
And mail inboxes with as many emails were also quite slow.
In 4.5 through 4.7+ the performance increased dramatically.
This was largely because of what I did to Manchu's i/o system.

(David Bienvenu also cleaned up the algorithmic architecture.)
(So all the improvement wasn't entirely due to me, naturally.)
Example: the 4.0 newsgroups stalled every 300 messages read.
This stall was caused directly by Manchu's commit disk writes.
Manchu had a single i/o write buffer which missed constantly.

The file position access pattern was due to the db architecture.
The only clean way to fix that was to do it the way that I tried.
I used IronDoc's pseudo VM to "memory map" Manchu files.
This effectively gave us N more LRU replacement write buffers.
I optimized it further by consolidating adjacent dirty page writes.

scaling address books @ (10nov01)
Scaling the address books in Manchu was much, much harder.
But the IronDoc style "memory mapping" was still very crucial.
Without it we couldn't build large address books fast enough.
Before I cover the complex changes, first I'll explain this part.
A big problem with Manchu btrees was this: they were binary.

binary searches @ (10nov01)
More accurately, all the index searches were binary searches.
The btrees themselves had a big enough fanout per tree node.
But a very strange design decision was used for inner nodes.
The inner nodes contained no keys to qualify an index search.
This meant that keys only appeared in the leaves of the btree.

In a normal btree (as I understand it) this isn't what happens.
A normal btree search hits a leaf only once for a single search.
And this leaf will be the one that contains the desired answer.
IronDoc btrees, say, navigate from root to leaf one time only.
But Manchu did this many times per search in a binary search.

Consider a btree with more than 100,000 members in leaves.
A binary search requires N comparisons to find the end target.
N is the smallest N such that 2**N is greater than 100,000.
So in this case, Manchu needed to compare 17 leaf members.
With a fanout under 32, only five of these were in one leaf.

So this meant twelve other leaves had to be read from disk.
In a huge 100,000 member index, leaves were widely scattered.
On average, every node accessed was another disk head seek.
Without LRU node replacement, inner nodes were also seeks.
Adding the last member of a 100,000 member db was expensive.

Netscape address books sorted every card six different ways.
This was in addition to the primary database entity ID index.
Each index took 12 to 15 disk seeks to add a new card member.
And this was done seven times per card. Tally up the numbers.
Adding the 100,000th member required a hundred disk seeks.

A typical hard disk can perform around sixty seeks per second.
For example, that was the maximum I observed on my machine.
A 100,000 member address book was sixty megabytes in size.
A statistically insignificant portion of it was cached in memory.
So virtually all the tree node accesses were hard disk reads.

In theory, importing an address book would slow to a crawl.
According to a projected curve, adds would take ~2 seconds.
So building a 100,000 member address book would take days.
And during that time, the hard disk would be totally pegged.
And on a Mac, the machine was frozen in a synchronous task.

I added print statements to trace what really happened inside.
Importing a big address book immediately pegged my hard disk.
Sixty disk i/o's per second until I killed the process. Okay then.
Nobody had patience to make a db bigger than 20,000 members.
But empirically observed numbers matched the expected curve.

So how did I fix this? Well, the binary search could not change.
That design totally pervaded the architecture, so it was locked.
But I could make the total file size smaller by trimming bytes.
After consolidating address structures, I halved the file sizes.
Then my only remaining trick was massive "memory mapping."

Enterprise customers could say to use a huge file i/o buffer.
So actual disk i/o was eliminated between the writing commits.
But cycles were still eaten converting to and from C++ nodes.
So building a 100,000 plus address book never got very fast.
But it came down under a couple hours, from days in theory.

virtual list views @ (10nov01)
When I resume this story, I'll cover position seeking in btrees.
It's a feature I developed in IronDoc that I put into Manchu.
This allows a UI frontend to correctly scroll index contents.
Used with autocompletion, it reports exact search positions.
A similar feature was added by LDAP folks to their directories.

(I could never figure out whether they followed my design.)
(If they did, LDAP folks never felt like giving me credit for it.)
(I explained how to put the feature into general btree indexes.)
(And I think David Boreham coded it into the Sleepycat DB.)
(But no one ever said thank you. That's life in the big city.)

11nov01 Sunday @ (11nov01)
? I've been dinking away at my Biscuit parse tree code today.
? Most of this time was spent instantiating more object types.
? Tonight I plan to rush through next parts of my Netscape story.
? Then I might not add any more, until perhaps next weekend.
? Today I only hope to summarize the basic standing conflicts.

? Well, I didn't get very far tonight, so I'll have to come back.
? I still have a lot to say about the technical runtime problems.
? When I resume, I'll explain Manchu's lack of real transactions.
? A crash during a commit typically corrupted a database file.
? During the middle of a commit Manchu was very vulnerable.

reiteration: codenames @ (11nov01)
The names are kept anonymous to protect Fu's reputation.
I refer to a vendor by codename Fu, and the product Manchu.
I don't want a fight, unless Fu decides to threaten me again.
I won't identify Fu unless he sends me threats in the mail.
And in that case I will post the communications sent to me.

guessing games @ (11nov01)
I received email today from someone who guessed correctly.
He sent me his own horror story about Fu's Manchu database.
But I won't tell you his name and company, though I should.
I'm not building up any big story about the Manchu product.
I want it kept anonymous. I don't want a fight. I'm all over it.

But you're welcome to send me your own stories if you like.
Maybe I'll have need of posting them one day, if I'm attacked.
But surely all my experience with Manchu is now very dated.
Nothing I say about Fu's old products can be relevant today.
Now it's ancient history. Peace on earth and good will to Fu.

Except my correspondent today asks about Manchu consulting.
This is based on the premise that I grasp the Manchu internals.
But yes indeed, I'm still trying hard to forget everything I knew.
So I'm not available for any Manchu consulting. Count me out.
Fu might go ballistic if I ever started doing Manchu consulting.

Fu tried very hard to ensure I'd never work with Manchu again.
I'll get around to describing my agreement with Fu and Netscape.
I've never been under non-disclosure. I have permission to talk.
Permission to talk was a big part of what I received on my end.
But while I was at Netscape, I was under some dire constraints.

I signed a paper promising I would never look at Manchu again.
And Netscape promised Fu to fire me if I did, for any reason.
Netscape decided to humor Fu to the extent I'd willingly tolerate.
I was agreeable and quiet, so Netscape management backed me.
I should probably try to reach this fun part of the story in order.

support contract @ (11nov01)
Netscape had a support contract with Fu, and I was the contact.
The primary Manchu architect was Mr. Fu, of whom I speak.
All my Manchu dealings were with Fu himself, on the phone.
My criticisms appeared to have the effect of enraging Mr. Fu.
Sometime later my manager asked, what did you say to Fu?

I said, I told him the buffering and transactions could be better.
Fu and I argued about memory management, but I'm a wizard.
(Don't argue with me about memory management. I'm right.)
So we paid Fu handsomely so he could take exception to me.
Somehow I don't think that's the way support contracts work.

After Fu tried to get Netscape to fire me, we stopped talking.
I was advised not to speak to Fu again, so he could calm down.
My manager told me Fu seemed to be very irrational about me.
But I didn't see or hear anything like that directly, by myself.
Still, Fu's demands focused solely upon what to do about me.

The original demands inside Fu's legal letter were so amazing.
It demanded that I stop all my IronDoc research for five years.
The letter argued I would inevitably steal Manchu's technology.
The only demand we met was keeping me away from Manchu.
Fu also received more license fees, and a bogus press release.

I explained to Netscape's management what IronDoc was like.
I could never use Manchu in IronDoc, even if I'd wanted to.
IronDoc and Manchu were extremely different across the board.
We pointed Fu at my website so he could see my public sources.
But we didn't tell Fu I found the Manchu technology wanting.

We figured that might put Fu into some kill or be killed mode.
I was told he seemed incensed I dared to claim a better way.
All the signs said the fight was very emotional and personal.
I suspected fear of a free IronDoc was another big motivation.
So Fu was pulling out the stops to kill my own db research.

Fu accused Netscape of letting his competitor see his sources.
He argued Netscape owed him the protection of his property.
And they'd violated this trust be hiring someone like myself.
Apparently only someone ignorant about databases would do.
Or at least, no one who knew enough to write their own db.

divorced from manchu @ (11nov01)
Fu demanded that Manchu be removed from my work machine.
There were two big impediments to acceding to this demand.
First, I was the main Manchu guy at Netscape, doing features.
Basically, no one else could possibly finish my Manchu work.
The next version of mail/news was contingent upon my work.

I needed a time extension so we could ship work in progress.
The Fu agreement gave me a deadline for Manchu emancipation.
The second big problem was that Manchu was in the code build.
All Netscape engineers built all the code sources, all the time.
I would be unable to work if Manchu was never on my system.

So we agreed I could download and build the sources, unseen.
The final agreement said I must never see the Manchu sources.
Including and especially not in the debugger, which was hard.
I actually kept to the letter of this agreement religiously, too.
Folks thought I was being ridiculous for not seeing it at all.

I said, excuse me, but I could get fired if I look at that code.
So I had to walk other folks through debugging without seeing.
I was actually a happy man once I was free of Manchu work.
It allowed me to do new fun work instead of ongoing support.
All my bugs were fixed in Manchu so I was ready to be loose.

gillmor

Dan Gillmor <DGillmor@sjmercury.com> Is a technology columnist whose ejournal I read regularly, as well as his columns.

Today I was moved to write Dan because lately I've been planning to write an essay about how easy it is to ensnare open systems with trace amounts of proprietary elements, or to make them so complex that only large teams can easily accomplish productive development. Dan's 10Apr00 ejournal entry asked for clues, and I felt inspired.

GILLMOR 10Apr00 [Quoting The Wall Street Journal's Interactive edition] "Under the proposal, Microsoft would be forced to grant royalty-free licenses to the product, opening its secret underlying software code to customers and computer makers."

DMc Here's my short take: opening the MS code to other developers will be nearly useless, because MS architecture favors large development teams. This is apparent from the nature of their APIs, which also punish small shops to give more advantage to large development teams.
One can simply make things too complex for small teams to productively manage. And if they complain, you can then ridicule them for lack of intelligence, since any good hacker (supposedly) can handle any amount of complexity.
It's obvious this is always to their advantage, since the number one advantage they have is the scale of resources to throw at any problem. So it behooves them to nurture a context where only big resources are terribly effective.
Soon I will write an essay about SOAP and XML-RPC on my site, to counter some of Dave Winer's mollification of the attacks on Microsoft lock-in plans using SOAP. Dave supposes things will be okay just because XML and HTTP are open standards. But they can transmit content of arbitrary complexity, and Microsoft can arrange lock-in by the way in which XML gets used, even if XML itself is open.
I had to lecture folks about this when I was in charge of storage on OpenDoc, so I know folks don't get this. People assumed that just because the OpenDoc storage format could be an open standard, that this could prevent proprietary formats. Nothing could be further from the truth. It's not hard to embed impenetrably complex formats inside an open format.
Or if one does not want the impenetrability to be so very obvious, one can tone it down to severe nuisance level, to something that merely has the desired lock-in effect, but without telegraphing the intentional result too clearly.
Note that I might reprint this email on my own web site. If I do, I will send you a pointer by email.

ZERO	How in the world can architecture favor large development teams? Don't you think that assertion is just a little bit paranoid?
GED	Since I don't want to defend the thesis, just consider it a hypothesis and see what you come up with after considering the issue a while.
ZERO	No wait, I was really looking forward to ridiculing your intelligence because you seem to think Microsoft APIs might be too complex.
GED	As you wish. And later you can impress me with your amazing command of complex APIs. But first, do you question the idea that proprietary content can easily be embedded inside open standard formats?
ZERO	Uh, I guess that might be possible. I could also put binary hex values in text portions of an XML element. But suppose I don't do that? What if all the literals are open looking strings and integers? Can that be closed?
GED	Yes, even simple looking things can be closed in practice, because they might only be useful in the context of a bunch of complex proprietary code which doesn't appear in XML. It's a subversively closed model.
VEX	Ah, come on. You make it sound like a complex code context might be done on purpose. What if it's closed by accident rather than design?
GED	I only believe models are designed to be open when the reasons for intended simplicity are openly discussed and defended. If a so-called standard is layered twenty levels deep like a skyscraper, then that seems dubious.

GILLMOR I'd like to learn more -- do you have some time later this week?

DMc Okay, yes I'll have time later this week, because I just started a six month sabbatical after my last day at Netscape last Friday. (I might have emailed you before from davidmc@netscape.com.)
In a day or two I'll write as short a web page as I can to assert the basic situation as I understand it with respect to embedding proprietary content inside open content. I'll send a link then.
You might want to point someone at the page and ask them, "Is this correct as far as you can tell? Can you explain?" I'm better at patterns, and not as good at telling examples to be most persuasive.
The basic idea is that open standards act like carriers, through which one can always tunnel proprietary data. The http protocol, for example, is clearly open. But one can tunnel propretary data through this open channel.
Similarly, XML is an way to encode anything in a standard way, but *what* gets encoded might be proprietary. In this case, the XML can effectively clearly delineate something mysterious like this: <proprietary-content>proprietary content</proprietary-content>

GED	In http://discuss.userland.com/msgReader$16072, William Crim states the basic idea, and I repond to this in the following http://discuss.userland.com/msgReader$16073.
ABE	Basically systems stay open only when all parts are open all the way down. If you can't navigate end to end through the data, then the few open parts are just a tease. Five percent closed can be as bad as fifty percent.
GED	I quote the relevant portions below.

CRIM msgReader$16072 It doesn't preclude MS from sending downright cryptic data down the wire.

DMc msgReader$16073 Yes, exactly, and in a day or two I'll write a web page about this, asserting that one can always embed (tunnel, etc) proprietary content in (through) open formats. Just because the frame is open doesn't mean the picture inside is open.

GILLMOR I've been wondering if the schema were subject to proprietary manipulation. It sounds so...

DMc The schema is less a problem than the meaning of the content in the bytes inside any given element. You can have an open spec for a schema that foo's should appear inside bar's. But you might be unable to do anything useful with a foo you receive, because it implies a really complex state machine in the handler.
A spec might say foo's can contain either "A" or "B", such as <foo>A</foo> or <foo>B</foo>, and that's very clear. But the meaning of what you do with A or B could be very hairy. For example, they might trigger something in COM on Windows that will only have a consistent result with someone else's system if you happen to use exactly the same models in COM and Windows.
Microsoft could still force people to upgrade their systems (or run other developers around in circles) by changing subtle meanings of what should be done when one sees "A" inside a foo. Cleverness involves how subtle the effect can be made, so folks out of sync suffer but not so badly it's clearly intentional.

GED	If a schema was proprietary, it would be immediately perceived by engineers working with the content and related interfaces. However, proprietary schema changes could appear and not be quite as obvious.
BEN	A practical XML vocabulary has two separate parts comprising grammar and semantics. A schema only covers the grammar. The semantics are much harder to specify, and harder to examine for proprietary effects.
LOKI	Computing systems are composed of both code and data in tangled semantic relationships. XML only captures the data portion. Any related code in a runtime context can impose arbitrary complexity.
GED	Code and data can be factored like an iceberg, with ten percent open data above the water line, but ninety percent proprietary code below. And for effective vendor lock-in, only (say) two percent below is required.

rysmccusker at yahoo dot com
cell: (415) 215-1797

David (Rys) McCusker
home: (415) 552-3810