From Mary Sue to Magnificent Bastards: TV Tropes and Spontaneous Linked Data – Kurt Cagle

"Bella was such a Mary Sue!" my teenager said in disgust after she got home from seeing the movie Twilight.
"Mary Sue?"
"Yeah, you know, she was like this little too perfect girl. Talk about epic fail!" she replied heading upstairs. "I promised my cosplay group I’d IM them when I got back and talk more about it."

I’ve long since resigned myself to not understanding the vagaries of teenage communication (a definite sign that my own youth is long behind me), but the term had me curious … so I Googled it, and stumbled on a gold mine.
One of the first entries to pop up was a rather curiously named site called TV Tropes (http://www.tvtropes.org), which included a page specifically about Mary Sues:

The name "Mary Sue" comes from the 1974 Star Trek fanfic "A Trekkie’s Tale". Originally written as a parody of the standard Self Insert Fic of the time (as opposed to any particular traits), the name was quickly adopted by the Star Trek fanfiction community. Its original meaning mostly held that it was an Always Female Author Avatar, regardless of character role or perceived quality. Often, said characters would get in a relationship with either Kirk or Spock, turn out to have a familial bond with a crew member, be a Half Human Hybrid masquerading as a human, and/or die in a graceful, beautiful way to reinforce that the character was Too Good For This Sinful Earth. Even back then, there wasn’t a total consensus on what was or wasn’t Mary Sue, since it’s not always immediately obvious which character is an Author Avatar. As this essay reveals, suspiciously Mary Sue-like characters were noted in subscriber-submitted articles for 19th-century childrens’ magazines, making this trope Older Than You Think.

The prototypical Mary Sue is an original female character in a fanfic who obviously serves as an idealized version of the author mainly for the purpose of Wish Fulfillment. She’s exotically beautiful, often having an unusual hair or eye color, and has a similarly cool and exotic name. She’s exceptionally talented in an implausibly wide variety of areas, and may possess skills that are rare or nonexistent in the canon setting. She also lacks any realistic, or at least story-relevant, character flaws — either that or her "flaws" are obviously meant to be endearing. She has an unusual and dramatic Back Story. The canon protagonists are all overwhelmed with admiration for her beauty, wit, courage and other virtues, and are quick to adopt her into their nakama, even characters who are usually antisocial and untrusting; if any character doesn’t love her, that character gets an extremely unsympathetic portrayal…

While I was rather surprised (though in retrospect not that surprised) to see serious analysis of character depth and plot by my eldest, I was also intrigued by the site itself. A trope is … a literary convention, not quite a cliche but more like an identifiable character type, scenario or quality that’s common enough to be readily identifiable. In programming terms, a trope would be a design pattern, a configuration in a story that seems to recur in fiction.

By its name and structure, TV Trope likely started out as a way for fans to analyze Japanese anime, but over time, fans from other media and genres * – movies, tv shows, comics, and yes, even books – added their two cents worth, using a fairly standard Wiki platform (pmwiki in this case). As this evolved, what emerged was something that should be recognizable to anyone who’s dealt with ontologies. Every page is either a description of a trope of some sort with examples, a resource (anime, book, comic, etc.) that exemplifies one or more tropes, or a cross reference page that categorize tropes as sub-tropes of other tropes, and of course each of these tropes or resources are hyperlink referenced. Given that there appear to be thousands of entries, TV Tropes makes for a rather spectacular connected data web.

* [Author's Note: As several Tropers have point out below in the comments - TV Tropes actually began (reasonably enough) as  a Buffy the Vampire Slayer board - the positioning of Anime first had to do with alphabetical order solely.]

Navigating through this space can easily eat up hours of your time, especially as you discover the subtle (and not so subtle) nuances that tend to hide behind out literary conventions. For instance, the "villain" trope is in fact a super-trope that can be subdivided into such characters as the Amoral Attorney, Bad Santa, Blondes Are Evil, Evil Minion, and one of my favorite, the Magnificent Bastard. Evil Minions, moreover, subdivides even further into such lovable characters as Punch Clock Villains ("Hey, it beats slinging hamburgers!"), Gas Mask Mooks (think Imperial Stormtroopers), and Middle Management Mooks (Imperial Stormtroopers in Brooks Brother Suits). Or there’s this example from Ralph Bakshi’s film Wizards:

Max: Fritz, get up for God’s sake! Get up! They’ve killed Fritz! They’ve killed Fritz! (Draws gun) Those lousy, stinking yellow fairies! Those horrible atrocity-filled vermin! Those despicable animal warmongers! They’ve killed Fritz! (starts shooting off-screen) Take that! Take that! (Fritz gets up) Take that, you green slime! You black-hearted, sharp, bow-legged-
Fritz: Max, Max, I’m okay. I’m okay, Max. Just a scratch; look, I’m all right.
Max: Oh. Oh, damn. There you go again, stepping on my lines, raining on my parade, costing me medals. Oh, damn. (Gun misfires, killing Fritz) Ooh. Oh, Fritz?

Tropes and Linked Data

While fascinating from a cultural standpoint, TV Tropes is in its way as significant from a Linked Data standpoint. The concept of Linked Data was first articulated by Tim Berners-Lee in a paper from 2006, Design Issues: Linked Data, in which he outlined four principles concerning the development of Linked Data:

  1. Use URIs to identify things that you expose to the Web as resources.
  2. Use HTTP URIs so that people can locate and look up (dereference) these things.
  3. Provide useful information about the resource when its URI is dereferenced.
  4. Include links to other, related URIs in the exposed data as a means of improving information discovery on the Web.

He later gave a formal presentation about Linked Data at the TED 2009 Conference, citing it as a necessary precursor to the rise of the Semantic Web.

While many people in the Semantic Web community tend to see Linked Data in terms of RDF, TV Tropes actually represents all four of these principles to a surprising degree:

  1. Every resource – a trope or media example – has its own clearly defined URI, in a format similar to that used by RDF # terms: a common namespace such as [nourl]http://tvtropes.org/pmwiki/pmwiki.php/Main/[/nourl] and a term identifier such as GasMaskMook
  2. This URL link will take you to the page about the term in question – the Trope page or the media resource description.
  3. The pages in question are encyclopedic in nature, typically providing both definition and examples.
  4. The links on these pages additionally are tied back not only to the web, but to other terms within the Trope namespace.

The vision that Tim Berners-Lee set out is perhaps a bit at odds with the current layout of the web, though if you look specifically at the rise of RESTful URLs, in which a given URL has a clear association with a resource (or resource collection, which can also be a resource), you can see that they have the same characteristic. Specifically, part of the URL serves much the same purpose as a namespace identifying a classification scheme or ontology, with the individual terms following that portion of the namespace identifying specific resources within the space. This association between resource and unique name is a critical part of the concept of Linked Data.

Admittedly, this is true of any encyclopedic work – an encyclopedia by its very nature is essentially a linked data namespace, with each article title being a specific term in that namespace and each cross-reference being typically either a link to another article in the encyclopedia (the ontology) or a citation to an external resource. However, its also worth noting in TV Tropes that there is an implicit (though not necessarily code visible) additional layer – most cross-references also contain some formal relationship. For instance, Bella from Twilight is a Mary Sue actually defines two distinct relationships – Bella is contained within the Twilight entry, and Bella has a link to the Mary Sue entry in an isA relationship. In many cases tropes are identified as being "similar to" other tropes or being the "opposite of" another Trope (such as a Red Shirt (a VERY expendable good guy minor character) being the opposite of a Mook). Additionally, there are instances where a Trope becomes subverted (the Red Shirt actually manages to survive the evil death ray and becomes a more fleshed out character in the process).

These relationships lay at the heart of the Semantic Web, because they define the relationship between terms, or between terms and resources, which opens up some interesting avenues for exploration. In a given Trope Term page, the term itself establishes a context, and all other links on the page to the same namespace then have either implicitly or explicitly defined relationships that could be read by a spider, or even encoded directly by authors in those cases where the relationship isn’t obvious. For instance, most links to other tropes will be "Similar To" type links, but being able to add markup for indicating inversion or subversion of a trope would make building an RDF link-base from either the page or the overall site much easier (although even here, the specific use of RDF isn’t necessarily required).

The overall link database, in whatever form, can then be used to see related relationships graphically, to "deconstruct" a given story based upon its underlying tropes, or even use it to generate story ideas directly. TV Tropes already does this to a limited extent, providing "toys" that will let you pull in plot, characters, settings, narrative devices and so forth to build a prototype story, or that will let you put together the perfect elevator pitch the next time you’re in Hollywood. These are fairly crude, but its not hard to see how combining this with RDF/OWL capabilities could open up a whole new avenue for writers to sketch out that next new story or screenplay.

Lessons for a Semanticist

There are a few lessons that Semantic Web proponents can take away from sites such as TV Tropes:

  • Wikis in general are Semantic Web/Linked Data goldmines, as they represent highly-connected, domain-specific knowledge.
  • Linked data sites illustrate the tight correlation between taxonomy and navigation – every term in an ontology should have some form of representational manifestation, and while it may be the links that are important to the semanticist or ontologist, it will be the content associated with that term that is most important to the consumers of the site.
  • Linked data sites are fundamentally dynamic in operation, even though they appear static, if only because the semantic qualities (relationships, categorizations and so forth) of a given link should be determined by the target metadata, not by the link creator.
  • Linked data sites should require comparatively little syntactic investment on the part of the content creators especially when establishing links to existing content. The domain expert isn’t going to know anything about semantics and likely will know little about markup, let alone XML markup … nor should they. This means that the best a designer of an encyclopedia wiki who wishes to extract semantics can do is either templatize content and provide tools to easily insert metadata into content nodes, or to rely upon filtering tools to make implicit (i.e., non-mechanical) relationship explicits after the fact.
  • Graph processing of the site should be tied directly into the update mechanism for a given term – if you’re creating a graph engine that will convert term nodes and subordinate links into RDF triples, then this graph engine will need to be reinvoked every time that the content gets changed. On the other hand, this might be a good project for a software product or dissertation.

Don’t be too quick to dismiss this type of thinking as being only appropriate for Wikis. Most sites’ navigational structures can be reconceived in terms of ontologies of "described" terms and correlative links, something that is well known to long-time users of Drupal, Joomla or other "community management systems" as well as wikis. It requires a change in the way that you approach web design, moving away from a mode based primarily on layout and towards one more built around concepts, but this shift in thinking comes with a number of benefits … especially when your site exceeds a threshold of a couple of dozen pages or so:

  • Your site can grow with relatively minimal involvement on the part of an editor or content manager.
  • Your site becomes more contextual, something especially useful when the information on the site is primarily informational.
  • It makes it easier to build community participation in your sites – people will be more inclined to add new content (or modify existing content) that’s relevant to them. This in turn helps them invest more of their presence into the site, binding them more strongly to your community.
  • You can harvest the semantic relationships within that data easier, something that can help both monetize content and can drive contextual advertising to a much more nuanced level.

There are hundreds (perhaps thousands) of sites like TV Tropes on the web, representing islands of linked data that should be seen as necessary precursors to full-blown semantic webs. The challenge increasingly is to recognize these sites for what they are and to understand that knowledge (and innovation) frequently comes when such islands come in contact with one another. So, whether your interest in the Semantic Web is a Mission From God or simply a Geeky Turn-on, the combination of community driven ontologies and linked data can results in a truly Crowning Moment of Awesome.

Kurt Cagle is the Managing Editor of XMLToday.org. He can also be followed on Twitter.

Over 25 Case Studies Featured at SemTechBiz

This year's Semantic Tech & Business Conference (SemTechBiz) will feature over 25 business case studies that will highlight semantic applications in action. Join us on June 3-7 in San Francisco and and explore how semantic solutions and linked data are being embraced throughout companies across a diverse range of business categories. Early bird pricing is available through Monday, May 14. Sign up today!