Building the Third Web: a Complex-Adaptive-Systems Approach

Not like this bears repeating, but “Web 2.0” has invaded. It has democratized and made dynamic what was once a paternalistic set of stable pages. The word “page” doesn’t even describe it anymore, at least when you’re flying around in Google Maps, or digging up stories in the Swarm, or chatting on Meebo. The hierarchy is breaking down. Rigid taxonomies are being bested by tags, clients replaced with servers, CEOs and Editors and Vice Presidents pwned by cheeky kids and commenters and bloggers. From hair appointments to academic papers, our conscious output is being distributed, linked, and aggregated on the fly, everywhere, in more ways than we could conceive.

What I’d like to do is try to more fully understand these phenomena by analyzing specific websites and trying to replicate, or generalize, their methodologies so that a whole host of automated instantiations of their workable models is created. In other words, I hope to achieve Tim Berners-Lee’s idea of the Semantic Web, i.e., a web designed to be readable and hackable by our computers, by leveraging what we already have: CAS-like web sites with large (and growing) user bases.

I believe this is possible, and in fact a more reasonable alternative to the standards-based approach that the W3C is pushing for. In their scheme, webmasters would be responsible for tagging their content using RDF markup and making connections among tags to a globally shared ontology that manages meaning on a more abstract level. I do not believe this to be the ideal proposal, because the incentives for webmasters, especially early adopters, to build out the semantic structure of their pages simply aren’t there.

However, applying CAS concepts to the problem, I think we can accurately “trace” the activity of real human users participating in the Web 2.0 sites they already know and love. In so doing, we can hope to create mechanisms for automatically generating the semantic relationships our computers will need if we are ever to have autonomous software agents making hair appointments for us.

As an example, one could imagine fetching the organically evolving semantic metadata being created on a website like Flickr.com, — where users constantly tag pictures and make connections among them — and building out a machine-readable hierarchy without the need for “manual” instruction from webmasters. One such project is underway at Air Tight Interactive, where the developers are able (by tracing user activity on Flickr) to present a set of related tags for each tag that you input. In essence they’ve created the kind of graph the Semantic Web initiative longs for, and they’ve done it automatically by simply modeling and measuring real user engagement on Web 2.0ish sites. Their semantic graph is remarkably accurate.

Alas, the guys responsible for that project and others like it are usually not graduate students in CAS; their applications only implicitly model the kind of CAS concepts being developed in academia. I would like to shift the focus, to work on projects that finally recognize Web 2.0 as a CAS and explicitly attack the problem of getting our machines to understand their underpinnings.

For instance, I would point to the WACO system conceived in [1] (PDF). Their WebAnts scour sites looking for semantic connections among them; they resemble search engine indexing bots, but are specifically designed for the type of project noted above, and more broadly, any site (i.e., a sub-graph of the Web) where meaningful relationships between pieces of content inhere in the structure of the pages themselves. Web 2.0 sites are ideally suited to that kind of analysis, because they are typically implemented as dynamic graphs whose architecture conforms to “real” human categories.

At the very least, we ought to encourage developers to open their applications for analysis. The open-source initiative, which is sometimes coupled with Web 2.0 in contemporary rhetoric, enables commercial programmers and academics alike to hack away at code they deem relevant. For us, it would mean more transparent access to the now-proprietary algorithms and parameters that have proven useful in understanding, organizing, and aggregating activity across all levels of an average user’s experience.

All told, such an effort could be a breeding ground for more “human” interfaces, for more granular search technology, for better pattern recognition, and so forth. It would be foolish to assert this is the only vector to achieve those goals, but it is a certainly valuable avenue that has yet to be fully exploited.

References:

[1] Rupert, Maya; Hassass, Salima; Rattrout, Amjad. “The Web and Complex Adaptive Systems.” 2006. Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA).

.

Comments

.

Bruce Britton Jun 18th, 2007 at 3:21 pm

I’m using this to respond to your comment on ‘narrative thought and narrative language.’ I’m still working on getting the Wallace piece, though I must have read it before, as I read the book, as I try to read everything of his. On the question of whether story is learned or genetic, some colleagues these days seem to want the default to be genetic for any human trait, which may be correct, but is hard to demonstrate, tho it may get easier as we get better at genetic analysis. I tend to take the default explanation to be learning, but obviously the best stance is to have neither be the default, but just to wait for evidence for either. The problem with which is, its so easy to get evidence that learning is involved and so hard to get evidence that genetics is involved.

.