« Battle Of The Sexes, cont. | Main | DevonThink Continued »

January 29, 2005

Tool For Thought

This week's edition of the Times Book Review features an essay that I wrote about the research system I've used for the past few years: a tool for exploring the couple thousand notes and quotations that I've assembled over the past decade -- along with the text of finished essays and books. I suspect there will be a number of you curious about the technical details, so I've put together a little overview here, along with some specific observations. For starters, though, go read the essay and then come back once you've got an overview.

The software I use now is called DevonThink, and I'm sorry to report that it is only available for Mac OS X. (I know there are a number of advanced search tools available for Windows, so I'm sure most of what I describe here could be reproduced -- I just don't know enough about the search tools on that platform to recommend anything.)

I talked in the Times essay about using the tool as a springboard for new ideas and inspiration. Here's what that process looks like in practice. This is the window that shows me an overview of part of my "research library" in DevonThink:

screen1.jpg

These are all books that I have transcribed digital passages from over the past 10 years or so -- you can see how many quotes for each book in the little number in parentheses after each title. Oftentimes I'll start the exploration with a straightforward keyword search, in this case: "urban ecosystem." I plug that in, and get back one result, a short quote from Manuel DeLanda's excellent 10,000 Years Of Non-Linear History.

screen2.jpg

This is where it gets interesting. I take that quote, and click on the "see also" button, which generates an instant list of other documents or quotes that have some semantic connection to the original one. I can see a few words from the entry, along with the author and book title.

screen3.jpg

I find another, more elaborate quote from DeLanda in that bunch:

screen4.jpg

And then I perform a "see also" on that quote. I get back a few pointers to essays that I've actually written -- and completely forgotten about -- including a review of an E.O. Wilson book on biodiversity that I wrote about three years ago. Ultimately, I end up with this wonderful quote from Jane Jacobs that draws an explicit analogy between natural and made-made ecosystems. The whole process takes me no more than a minute.

screen5.jpg

Over the past few years of working with this approach, I've learned a few key principles. The system works for three reasons:

1) The DevonThink software does a great job at making semantic connections between documents based on word frequency.

2) I have pre-filtered the results by selecting quotes that interest me, and by archiving my own prose. The signal-to-noise ratio is so high because I've eliminated 99% of the noise on my own.

3) Most of the entries are in a sweet spot where length is concerned: between 50 and 500 words. If I had whole eBooks in there, instead of little clips of text, the tool would be useless.

I think #3 is the point that needs to be drilled home to people working on desktop search. It's been hidden from us largely because the web itself is broken up into pages that are often in that 500 word sweet spot. Think about the difference between Google and Google Desktop: Google gives you URLs in return for your search request; Google Desktop gives you files (and email messages or web pages where appropriate.) On the web, a URL is an appropriate search result because it's generally the right scale: a single web page generally doesn't include that much information (and of course a blog post even less.) So the page Google serves up is often very tightly focused on the information you're looking for.

But files are a different matter. Think of all the documents you have on your machine that are longer than a thousand words: business plans, articles, ebooks, pdfs of product manuals, research notes, etc. When you're making an exploratory search through that information, you're not looking for the files that include the keywords you've identified; you're looking for specific sections of text -- sometimes just a paragraph -- that relate to the general theme of the search query. If I do a Google Desktop search for "Richard Dawkins" I'll get dozens of documents back, but then I have to go through and find all the sections inside those documents that are relevant to Dawkins, which saves me almost no time.

So the proper unit for this kind of exploratory, semantic search is not the file, but rather something else, something I don't quite have a word for: a chunk or cluster of text, something close to those little quotes that I've assembled in DevonThink. If I have an eBook of Manual DeLanda's on my hard drive, and I search for "urban ecosystem" I don't want the software to tell me that an entire book is related to my query. I want the software to tell me that these five separate paragraphs from this book are relevant. Until the tools can break out those smaller units on their own, I'll still be assembling my research library by hand in DevonThink.

I wonder whether it might be possible to have software create those smaller clippings on its own: you'd feed the program an entire e-book, and it would break it up into 200-1000 word chunks of text, based on word frequency and other cues (chapter or section breaks perhaps.) Already Devonthink can take a large collection of documents and group them into categories based on word use, so theoretically you could do the same kind of auto-classification within a document. It still wouldn't have the pre-filtered property of my curated quotations, but it would make it far more productive to just dump a whole eBook into my digital research library.

The other thing that would be fascinating would be to open up these personal libraries to the external world. That would be a lovely combination of old-fashioned book-based wisdom, advanced semantic search technology, and the personality-driven filters that we've come to enjoy in the blogosphere. I can imagine someone sitting down to write an article about complexity theory and the web, and saying, "I bet Johnson's got some good material on this in his 'library.'" (You wouldn't be able to pull down the entire database, just query it, so there wouldn't be any potential for intellectual property abuse.) I can imagine saying to myself: "I have to write this essay on taxonomies, so I'd better sift through Weinberger's library, and that chapter about power laws won't be complete without a visit to Shirky's database."

These extra features would be wonderful, but the truth is I'm thrilled to have the software work as well as it does in its existing form. I've been fantasizing about precisely this kind of tool for nearly twenty years now, ever since I lost an entire semester building a Hypercard-based app for storing my notes during my sophomore year of college. There's a longstanding assumption that the modern, web-enabled PC is the realization of the Memex, but if you go back and look at Bush's essay, he was describing something more specific -- a personal research tool that would learn as you interacted with it. That's what I think about whenever I use this system to stumble across a genuinely useful new idea: finally, I have a Memex!

Posted by sberlin at January 29, 2005 08:43 AM

Comments

I'm very interested in mindhandling software and I am thus very glad about your post about DevonThink. Right now I'm testing it and will most probably buy it.

Please keep us furthermore informed about think and expression tools, such as ThinkDevon, Ulysses and others.

Cheers, Stefan

Posted by: Stefan Herzog at January 29, 2005 12:19 PM

I'm trying to replicate your system. Do you name the individual entries with the text of the quote?

Also, can you go a bit into your quote-harvesting process? Do you input as you read, or ...?

Thanks.

Posted by: Pedro at January 29, 2005 03:43 PM

I think plain old paragraphs fit your #3 requirement pretty well. They're units of text whose size is usually on the smaller side of the 50-500 "sweet spot", and almost always carry enough information to be somewhat self-contained in relation to the text around them.

Each file type usually has a specific way of defining paragraphs, and even in plain text there are a few common strategies most people use, such as keeping a blank line between two paragraphs, or preceding each one with a tab or a few spaces. For this reason, making a program to fetch paragraphs from a document wouldn't be too hard.

Posted by: Bira at January 29, 2005 04:04 PM

I use DevonThink for a similar purpose, and I love it. Although I should mention that in my browser I can't actually see your screenshots (?)

Also:
something I don't quite have a word for: a chunk or cluster of text

Have you considered using "lexia" as the word you're looking for?

Posted by: Jeremy Bushnell at January 29, 2005 04:26 PM

I would like to see something akin to this for images. Any ideas?

Posted by: ed at January 29, 2005 04:29 PM

One small, mildly off-topic request: would you mind changing the images in this post to be in PNG or JPEG format? Neither Firefox nor IE on Windows seems to be able to load them.

Posted by: Evan DiBiase [TypeKey Profile Page] at January 29, 2005 05:01 PM

Sorry about the images -- could have sworn they were jpegs before. They should be viewable now.

As for how I capture the quotes themselves, I have long used an advanced piece of software called a "research assistant" to type in passages that I've marked. I just started experimenting with scanning and OCR'ing in though, which seems to work fairly well...

Posted by: Steven Johnson at January 29, 2005 05:16 PM

Very interesting -- thanks for sharing.

As for the entire book vs. quote -- I use a program on Windows called DTSearch which is basically a full-text search program on steroids.

One of the things it can do is show the results in context and use fuzzy searches, proximity settings, etc. so if I search for "concept X", rather than saying "oh, it's somewhere in this e-book here" it will show the relevant parts of the book that match the search.

Still a long way from being perfect and it can't do some of the things it looks like you're doing with DevonThink, but works pretty well.

I've looked at a lot of this stuff on Mac and Wintel, and its kind of odd at just how primitive the tools are for either OS for this sort of thing. If you'd have asked me in the mid-1990s, I'd have assumed progress on organizing and searching free-form info would have progressed a lot farther than it has.

Posted by: Brian Carnell at January 29, 2005 06:10 PM

This Devonthink app seems a lot like the new Spotlight feature in the upcoming Mac OS 10.4 Tiger. What sorts of features does Devonthink offer that Spotlight won't?

(As in, why should I buy Devonthink instead of waiting to upgrade to Tiger?)

Posted by: Tarek at January 29, 2005 07:03 PM

Can someone recommend an equivalent to DevonThink for Windows? I don't even know how to do a google search for the software because I don't know what it is called in the general sense.

Posted by: Halfer at January 29, 2005 08:05 PM

DevonThink vs Spotlight:

http://www.devon-technologies.com/products/devonthink/background/spotlight.php

Posted by: Matthew Amster-Burton [TypeKey Profile Page] at January 29, 2005 09:08 PM

When and where will your piece on London sewers appear - sounds interesting (for a civil engineer like me anyway).

I'll second the request above for the names of Windows programs equivalent to Devon. Shouldn't all Devon's competitors be deluging you with emails after your article?

Posted by: Ethan at January 29, 2005 11:43 PM

Suddenly, DevonThink makes sense. As a returning student after many, many years away, I'm trying to find how to take best advantage of the technology which simply didn't exist before. DevonThink is a tool I've downloaded and tried, and never really had it click. It's clicking now.

What's problematic, however, is that now I've got one more tool which does one thing and that's it. Sure, I could compose in DT, but it's not its strength. So I compose in one location, save my research in DT, and my bibliographic info in EndNote (which I might drop for Sente or Bookends anyway). I suppose three tools isn't that bad, now that I think about it.

Posted by: Jeffrey at January 30, 2005 02:39 AM

If you use Windows, check out www.asksam.com

Posted by: Adnan at January 30, 2005 02:51 AM

Questia (online library of ebooks) can make semantic searches except it can handicapped by the fact that you're searching through whole e-books even though it lets you search inside the book.

(www.questia.com)

Posted by: Adnan at January 30, 2005 02:59 AM

Steven - this is poignant post about search. We just completed a book titled "Lucene in Action" and I built a "search inside" the book website for it. The granularity of search results are book sections, not pages. I am also capturing, yet not exposing yet, each page of a section in order to have better information displayed. I've also linked a blog into the table of contents page - so I can add commentary/errata after the fact to a book section. I will be building in "see related" types of connections that are not made explicit.

I'd be grateful for you to review what I've built and offer suggestions to further enhance this type of thing. I have not yet considered hooking in handling multiple books, but our publisher is definitely interested in adopting the system I've built and these types of inter-book connections would be a great thing to have.

Posted by: Erik Hatcher at January 30, 2005 07:57 AM

A much simpler (and of course less powerful) program for writers to keep track of notes of any kind (I use it for quotes) is Notational Velocity. It's free and is OS X only. It's my most used app. You can get it here: http://pubweb.nwu.edu/~zps869/nv.html

Posted by: dobbs at January 30, 2005 08:33 AM

Hi Steven, Glad to come across your article in the Times and your site. I'm really curious how you digitize/save all your qoutes from other sources. Are they word coduments, emails to self, some kind of database? I'm doing the same but am pretty haphazard about it and would love to hear your method. Thanks.

Posted by: Larry Straus at January 30, 2005 09:37 AM

For almost a decade from ~1988 I kept my reading & research commonplace book in Persoft's IZE, a DOS textbase -- orphaned all too soon -- that did simple but very useful things with keywords presented in an indented hierarchy. The more entries and keywords I gave it, the more the hierarchies took on increasingly interesting and suggestive sequences; i.e. they looked more like *outlines.* IZE seemed to understand the content of the passages.

I knew perfectly well that appearance was "just" a reflection of my choices of keywords -- an embodiment of how I used and related words -- but it felt uncanny all the same.

Norretranders quotes Kline quotes Hertz on Maxwell's equations: "One cannot escape the feeling that these equations have an existence and an intelligence of their own, that they are wiser than we are, wiser even than their discoverers, that we get more out of them than was originally put into them."


Posted by: Monte Davis at January 30, 2005 09:44 AM

"One of the new applications that came out last year was Google Desktop -- using the search engine's tools to filter through your personal files." Loading this Google software into at least a Windows machine opens a back door to the computer. Anyone can open this door and walk into your computer.

Posted by: Karen at January 30, 2005 10:00 AM

And, of course, ten minutes later I trip over New Scientist on semantic search for Google... now Slashdotted...

http://www.newscientist.com/article.ns?id=dn6924

Posted by: Monte Davis at January 30, 2005 10:12 AM

Sorry -- comments were down for a few hours. Should be back up now.

Posted by: Steven at January 30, 2005 02:21 PM

I tried DevonThink some months ago. I initially liked it but then stopped to use it, as it lacks multilingual capacities. I usually store quotes or chunks of text in the language they are written, and that approach unfortunately prevents DevonThink to do its magic. Still looking for a piece of software with such capacity.

Posted by: Ricardo Montiel at January 30, 2005 03:47 PM

I'd add that the useful chunk size online is often not the URL of a main page or an index but a permalink pointing to a specific, often brief, entry in a weblog.

btw, SBJ, ever experiment with Voodoo Pad?

Posted by: xian at January 30, 2005 05:01 PM

Steven:

You said: "I wonder whether it might be possible to have software create those smaller clippings on its own"

I have two possible solutions you could investigate:

1) Book2Pod is free, and converts etext into iPod-notes sized chunks -- each chunk is about 4K big, which works out to about 680 words - a bit higher than the sweet spot, but maybe not so bad. http://www.tomsci.com/book2pod/

2) The O'Reilly network published a 3 parter on how to build an eDoc reader for the iPod here: http://www.macdevcenter.com/pub/a/mac/2004/12/14/ipod_reader.html I think they have the finished software available for download, but, since they give you the source, you can probably hack it to generate notes much smaller than 4K (ie: somewhere in the 50-500 word zone)

Both of these are free. The second one is interesting because it can format text from pdf's into iPod sized notes.

Anyway: Thanks for sharing DevonThink with us. I've seen it before, but I think I'll go have a closer look at it in light of what you just wrote.

Posted by: Robert Hahn at January 30, 2005 08:46 PM

Steven, are you familiar with Simpy (my name should link to it)? Simpy currently does for web pages what you described in this post, and the upcoming Simpy release will have support for Notes, which will work _much_ like you described your tool. I hope to make the new release is about a week.

Posted by: Otis at January 30, 2005 10:38 PM

Steven,

Thanks for the interesting article and followup here on your web site. I've been using DevonThink on and off for some time now, and you have supplied me with a schema for using it that I was close to and yet, at the same time, far from.

I've noticed that you are simply putting page numbers with your quoted text, but you are not putting the source (since you have them filed in source-specific folders). I've long wondered how much biblio info to put into each note, usually falling on the more-the-better side, because I worry about my note getting dissociated from its folder at some time in the future. There is also the problem of tracing your note back to the containing folder, that is, if I'm reading a note what source is it from. DevonThink doesn't seem to have any command to track the note back to its containing folder (unless I'm missing something).

This brings up my primary wish for a future revision of DevonThink: the ability to include metadata tags for each note, which in this case could include the full reference, page number, etc. Another program, Tinderbox, handles this metadata beautifully, by putting it in headers at the top of each note, and by making these fields customizable. Of course, that one feature hasn't been enough to make me move out of DevonThink.

Doug

Posted by: Douglas Holschuh at January 30, 2005 10:44 PM

Great essay in the NYT Book Review; I had no idea such technology existed. Guess I will have to return to the Mac.

Posted by: Larry White at January 31, 2005 12:04 AM

I have about 200 mb of blog entries I'd like to somehow import into this. I wonder if that's even doable.

Posted by: Mark Crane at January 31, 2005 11:44 AM

Do you still use a reference manager like Endnotes as well or is this the only app you use to organize your sources? If you still use a reference manager, does it play nice with DevonThink? Import/export, etc.?

Posted by: Tanya [TypeKey Profile Page] at January 31, 2005 12:52 PM

Steven,

I've spent the last three years doing doctoral research and development in this space. I was reading over your blog entries and thinking how closely what you've written here matches a lot of what I find in my own notes (I also like De Landa's work, BTW).

I think Topic Maps are the technology you're looking for to enable sharing "personal libraries to the world." The phrase that Steven Newcomb (one of its inventors) uses is "global knowledge interchange." The ISO Topic Map standard is ideally suited for creating a graph-structured, subject-based index of a set of information resources. These Topic Map documents can be merged or federated with others in controlled ways, even maintaining the contexts between who said what.

My own project is called Ceryle, uses a graph visualization of Topic Maps as the primary organizational metaphor, heavily uses Dublin Core metadata, works cross-platform, and I've recently been filling in some of the bibliographic support features as I'm using it to organize my own dissertation. The software is currently in evaluation and will eventually be released into open source. I'd be happy to discuss the project in greater detail if you wish.

Thanks very much for your informative article -- Murray

Posted by: Murray Altheim at January 31, 2005 03:12 PM

Steve, a very good and timely essay in the New York Times Book Review! In addition to the very postive comments, I would add that DEVONthink would seem to be using what is known in the Informational Retrieval (IR) and Natural Language Processing (NLP) fields as "collocation of words" that can be defined as:

"The 'collocation' of words refers to the regular patterns of co-occurence in which words may be found in a given context; the way words are found together. eg. We expect to see fish with chips, goods with chattels, break with enter, blue with sky. In certain circumstances, indeed, such items would be foregrounded if they did not occur together. When this happens it is 'unusual collocation'".

Thanks,
John

Posted by: John T Kane at February 1, 2005 12:13 AM

I'd like to know of a product even remotely close on Windows as well. I have AskSam-- like many others it can search quickly through text-- but the next level (the "see also" in the description above) isn't there, and the UI is not particularly good...

Posted by: Chris L at February 1, 2005 10:10 PM

Steve, how do you title your quotes/documents? It seems that it just an indiscriminate first few phrases of the note? Why not use page numbers?

Posted by: John Beeler at February 1, 2005 10:45 PM

fascinating! thanks for sharing your tool. i really like the idea of small personal library that people could query for research purposes.

going further, let's say a groups of people within the same intellectual domain pool their library and opens it up for more semantic linking... and so on....

eventually this could potentially lead to a small sampling of what Sir Tim Berners-Lee had been dreaming all along: The Semantic Web.

Posted by: coolmel at February 2, 2005 04:27 PM

For the PC, the Orbis part of the NotaBene
academic suite would be quite good.
www.notabene.com

Posted by: ReaderFella at February 2, 2005 06:26 PM

I really appreciate blogs like this one becuase it is insightful and helps me communicate with others.
thanks.also, that guy billyz, I really need to talk to you about that cure you mentioned. communicate really to you appreciate and insightful helps need you that blogs becuase with talk others.
thanks.also, I one really I about that mentioned. guy like billyz, to it cure this me is

Posted by: online pharmacy at February 3, 2005 12:28 AM

I've started using a wiki to do similar tasks, and I would love to see this sort of search/linking ability incorporated into it.

Posted by: bs23 at February 3, 2005 08:35 AM

Two tools may be of interest:
1. MS Word's indexing function is very flexible for capturing connections you want to be reminded of, including your own added comments that don't appear in the body of written material being indexed and which aren't indexed by automatic methods. Such indexes are easy to skim, scan, search, and change.

2. FLIPP is a way to clarify explanation of how to use complex systems by putting content information in non-symbolic, non-verbal visual frameworks that look like game boards. Instead of describing rules of complex logic, it displays all at once all the scenarios that make any kind of sense for a given complex system. Users simply select the scenario that fits their situation and meets their objectives then follow it to conclusion. Several things make it remarkably friendly: all logical connections in the scenarios are shown without words, symbols, icons, or spaghetti connecting lines. The number of text explanation pages is reduced typically by 90%. User preference has been universal. "Logic revealed to a degree beyond belief." Translation among any languages is vastly simpler because the logic -- the gameboard formats -- remain unchanged across all languages, even those written right-to-left. Computers, while convenient, aren't required.

The method is now in the public domain freely available to all and demonstrated at http://www.flipp-explainers.org

Posted by: David Cox at February 5, 2005 09:30 AM

Steven Johnson?s essay in TNYT resonates with every researcher and writer who seeks connections among words, concepts and especially the fertile relationships that make their ideas come alive, sustaining their work with rich evidence. Ideas feed on ideas, which, in turn, inform the manner in which we build convincing arguments. In the Windows environment, NOTA BENE achieves much of what Steven presents in his essay (www.notabene.com). Except that NOTA BENE is designed as an integrated package of three applications that, together, provide sophisticated hypertext searches which are automatically joined to their respective bibliographic sources and are then served to the word processing application. In addition, all the components of the writer?s document, including not only the writing, but also found texts, references and bibliographies, are automatically formatted according to major academic style manuals. Thus, hypertext, bibliographic management and word-processing are combined into a seamless whole.

But to some of Steven?s specific points, ORBIS, which is the hypertext application in NOTA BENE, does more than respond to boolean operators as it searches across the user?s computer network. One of its more powerful features is the ability to search associated terms. For example, a writer could search for ?church? and be presented with passages containing ?eccleasistic,? ?tithes,? ?anticlericalism,? or ?priest,? among others. The user brings the bibliographic reference associated with each selected item as it is brought into the writer?s evolving document. The bibliographic reference then takes on the academic style for that document. In my many years of using NOTA BENE, I have not ceased to marvel at discovering relationships that are neither obvious nor likely to be remembered in our cluttered minds and that are found in years? worth of accumulated notes and texts scanned into the computer. But then again, I also marvel at the masses who actually believe that merely by typing into their computers they have endowed their writing with a dynamic, living personality when all they?ve done is, well, type.

Mark

Posted by: Mark Szuchman at February 5, 2005 01:09 PM

Do you need an appliance repaired?

Posted by: appliance repair at February 8, 2005 01:44 PM

A related and useful tool I've found is called Furl (www.furl.net). It saves the actual content of a webpage you've visited to your free profile on the Furl server (like Google's cache). It also will save your comments and category description if you choose to enter them. You can save webpages with one click as you surf and amass a searchable history of your most interesting finds.

It also has a public feature that allows you to search other people's collections.

Posted by: Sarah at February 17, 2005 02:26 AM

Hi Steven,

In the associative thinking space (is there such a space?), one of the more notable programs is called IdeaFisher, found at http://www.ideafishing.com.

Marsh Fisher, the co-founder of Century 21 Real Estate discovered that it's the association between disparate words and ideas that create the most valuable end products. He used "real estate" and "franchise" to arrive at the company that he took public. But in the last 5 years of his mentoring, he's shown me a much bigger world through the EXPANSION of associations (seeing how far from the root you can wander) to the DRILLING-DOWN on specific concepts (through specific questions posed by seasoned "experts").

It's a fascinating area of study, and his software does a pretty good job of both those operations.

The current versions run on XP and Mac Classic, but the company is releasing a version for OSX and upcoming Microsoft OS. There's a blog of screen shots and descriptions at http://www.ideafisher-upgrade.com.

So... here's the API issue: who can make a SAFE, easy to use system that does what DEVONthink does (searching and organizing associative content) with a search-engine that can drill-down on that content, and works seamlessly between the web and the desktop... but also keeps results in enough of a linear form that users don't get lost on tangents when they are in "brainstorming" mode, but is free-form enough to allow the interface to not get in the way of the user experience?

I'll try DEVONthink, and see how this relates. For desktop content, the Mac's own integrated search app is pretty slick, but the results are not persistent... and Google still works slicker when it comes to web-centric content...

It was really enlightening to run across your NY Times article, and I look forward to seeing how you expand on this concept.

Best,
ME

Posted by: eAgent [TypeKey Profile Page] at February 17, 2005 07:17 PM

testings

Posted by: Johny at February 19, 2005 06:27 PM

An historical note: Index Cards, Clean Copies, and Research Assistants - from Jerry Monaco

For years I have been writing on index cards. The precedent was of course Nabokov, who wrote his novels on index cards, but also the chess players I knew in my youth, which was b.c. (i.e. before personal computers). All the great chess players used to remember opening variations, innovations, etc. by keeping vast indexes of their favorite openings on index cards.

I would arrange my index cards by quotes and books and date and potentially each card was cross-indexed. When I stopped writing seriously and only wrote for myself in my journals, I stopped indexing all of my paragraphs and quotes. What I soon realized, after I stopped indexing, is that the availability cards at my fingertips had also created an accessibility of the thoughts on those cards in my memory..

I assume that computer indexing and access has a similar, though less manual, effect with the added plus of being able to use the computer as a supplement brain.

The working habit of writers is not a very well understood process. Both Melville and Tolstoy needed help (their wives) to create clean copies of their manuscripts. What typewriters allowed writers to do is create their own clean copies without collaboration. Computers have now allowed us to create indexes of our own thoughts without collaboration also. This used to be the job of research assistants.

Jerry Monaco
His Blog
Shandean Postscripts to Politics and Culture

Posted by: Jerry Monaco at February 21, 2005 03:03 PM

"Already Devonthink can take a large collection of documents and group them into categories based on word use, so theoretically you could do the same kind of auto-classification within a document."

Hi Steven,

You may be interested in trying theConcept by Mesa Dynamics (disclaimer: my company) which resolves documents (one or more), or search engine results from Google (and other search engines) into an index of the most significant key words and phrases in the overall text/web page results.

It isn't quite the memex you're looking for, but like DevonThink, the idea is to break up thousands of words into semantic concepts. However, instead of "searching" for information, theConcept builds an index that helps a user understand the prevailing topics in the text. Each concept can then be explored more deeply by looking for citations from specific places where the key words or phrases were discovered.

If you do try it, I'd be happy to answer any questions or respond to comments and/or suggestions.

All the best,

Danny Espinoza
Mesa Dynamics
http://www.mesadynamics.com

Posted by: Danny Espinoza at February 24, 2005 12:40 PM

Thuriam is a technology consultant providing world-class services, Customer interaction in varied application areas and focused on BPO & Knowledge industry. Thuriam assists in the identification and development of business opportunities in the emerging BPO & Knowledge Services markets.Offshore outsourcing of your Business Process is a compelling business strategy. At Thuriam we attempt to identify the possible outsourcing opportunity and present our capability as an integrated Technology and process Outsourcer catering to all your needs. We would be your partner to provide end to end services across the value chain.Our integrated analyses provide industry, competitive, customer and technology innovation along with strategic, tactical, and operational recommendations to help maximize the bottom line from business strategy, service delivery, marketing and sales efforts. We provide high end services by utilizing the latest technologies and at low costs.
Our Services:
Knowledge Services
Medical Backoffice Services
· Medical Transcription
· Medical Billing
· Medical Coding
Legal Consulting Service
· Legal BPO
· Legal Documentation
· Legal Research
Data Research Services
· Data Collection & Extraction
· Database Services
BPO Services
Digitization Services
· Transcription Services
· Media Conversion Services
Data Processing Services
· Form Processing
· Data Entry
Data Conversion Services
· Electronic Publishing
· Prepress Services
· Data Capture
Medical Transcription:
Our endeavor is to constantly re-engineer the dictation/transcription process in order to take full advantage of the latest advances in computers, networking and digital technology. Combining these technologies with competent transcription skills allows us to provide cost-effective solutions to meet your transcription needs. We offer clients a set of core competencies, which are invaluable to the health information management professionals as well as other professionals who must dictate a steady stream of reports and correspondence. We are specialized in the following Reports - Discharge Summary, Death Summary, Progress Notes, Clinical Notes, Emergency Notes, History & Physical, Radiology Consultation Reports, Office Visits, Physician Letters, and Psychiatry Reports etc.
Medical Billing:
In today’s competitive business world medical billing acts as a key in the revenue management of health care industries and hospitals. Medical Billing plays a major part in the income of the respective organization. Any inaccuracies made in pricing, missed charges, errors in coding during manual procedure will turn out as a very huge loss to the organization. Billing process includes patient registration, charge entry, patient statement and enquiries, standard reports and accounts receivable management. Thuriam’s medical billing service enables to eliminate these loses and accurately process your medical billing. more…
Medical Coding:
In today’s business world Medical coding is considered to be a serious business among the health care industries and hospitals. Manual errors while performing this task may create a major problem and huge loss to the organization. Thuriam’s Medical coding consists of combination of numbers and alphabets adhering to different coding standards. These codes help medical insurance and others to understand what was wrong with the patient, whether treatment was necessary and what services were administered. All these coding helps the non-medical staff at the insurance and other health service providers to handle the claims and make payments on predetermined basis. more…
Legal Consulting Service
At Thuriam we have highly dynamic and high growth Off-Shore service delivery space of Legal Consulting and Back Office Services, where most of the initial process delivery opportunities are moving towards commoditization, we have invested significantly in creating a unique Solution Based value proposition.
Our diversified business practice represents a wide range of commercial, industrial and financial enterprises, both publicly and privately held. Thuriam has the following work departments: corporate, employee benefits, health law, intellectual property, private clients, regulation and government affairs, tax and trial. We offer Legal Drafting, Deposition Summary, legal coding, legal billing, paralegal, legal research, to documentation service. more…
Data Harvesting Service
We believe that our Value Proposition framework when tailored to meet our client's specific business and process delivery requirements will further enhance the value we passionately deliver to our customers on consistent basis. We understand just how critical building and maintaining directories and databases is to any business, educational or research organization. Our expertise in data harvesting domain includes Data Capturing Service From The Web, Online data entry and internet search, Catalog / database management, Internet research, email mining and customized list making, Portal management support, e-Newsletters / e-Clippings, Secondary Research / Market Intelligence. more…
Data Conversion Service
Our world leading data conversion capabilities include services such as e-Book Conversion, XML, SGML, HTML, PDF and Tiffs. This apart, Thuriam converts data stored on paper, proprietary file formats and many other formats into the choice of data and medium that you would desire. Worldwide users of our data conversion services include major publishers and manufacturing companies, academic and research libraries colleges and universities, and public utilities.
Many organizations are faced with mountains of non-standard source material created over the years by a variety of organizations and individuals. Their current and future business goals depend on converting this data to a standardized database. Thuriam offers a full range of consulting and data conversion services to help organizations meet this critical need. We not only convert data, we understand your objectives in a broader context. That's why we proactively suggest improvements in database design, and identify simple solutions that add power at little or no cost. We foresee and resolve integration issues, always focusing on improving the ease, reliability, and speed of your database. more…

Digitization Service
At Thuriam, we provide the specialized Digitization with the combination of individual project management capacity and experience by retrieving the contents from videos, books, images, archives, photographs, government records and collections, and any other documents and converting in to comprehensive digital formats for universal access, web delivery uses, copying, and other means of distribution. more…
Data Processing Service
Data Processing is a constraint to be done by all government agencies, institutions, companies, industries, etc. for purposes like purchase requisitions, travel approvals, time off requests, and expense reports. Thuriam initiates Data Processing by receiving the forms from you and create a new database for the information in your forms. Then follows designing of the form structure, quality check and delivery of product. more…
Our Strategy Consulting includes
· Business Strategy & Design
· Market Analysis
· Competitive Intelligence
· Portfolio Development
· Product & Service Innovation
· Blueprints for the future
· Project Management
· Process Development

We are looking forward to the prospect of increasing India's share of the world market in BPO & Knowledge Services outsourcing through the advisory services offered by Thuriam.
To learn more about our service packages, specific to meeting your business objectives, e-mail us at: info@thuriam.com.

Posted by: thuriam at February 28, 2005 04:45 AM