By Keith Fahlgren and Liza Daly

A major theme of this year’s Books in Browsers was authoring. Liza and Keith have been trying to move our thinking about digital books beyond the low-level plumbing of files and formats, so we focused on what authoring will look like when files are irrelevant, distribution is seamless and transparent, and voice recognition is mainstream. What we (almost!) pulled off was a demonstration of a new mode of writing:

  • creating manuscripts via voice recognition and Google Docs
  • distributed editing via Google Docs and Google Docs comments
  • collecting marginalia via Twitter

You can watch all of this mostly happen, then totally fall apart during the live-demo, which we were “fortunate” enough to have recorded and preserved as a video:

(In fact, the software worked but it relied on Github Pages to post the output; it seems that we triggered some kind of traffic throttling system as our code rapidly posted update after update. We sincerely appreciate the audience’s good humor throughout.)

Streaming authoring: a demo

The actually functioning self-generated, self-published, live-annotated transcript of our talk is now available. It’s worth reading separately from this post.

A three column version of the talk transcript, with specific annotations from Google Docs on the left, the actual captured content in the middle, and tweets on the left

The vision

Our fundamental idea is that a new ecosystem of tools – like Google Docs, social media, or Siri – will obsolete the laborious workflow of modern publishing: wordprocessor followed by emails followed by files followed by conversions followed by FTP followed by static, siloed presentation (followed by silence).


The first stage of the new process will be based on markedly simpler tools for creating the rough manuscript. While first drafts are likely to be created with the familiar interface of hands + keyboard, as Peter Brantley remarked at Books in Browsers, “We need new entry points for authoring.” His comment referred to video; our direction was live narration and speech recognition.

In our demo, we captured the transcription of Liza’s conference presentation with voice recognition in real time. Each time Liza switched slides, the slide content and transcript was automatically pushed via the Google Drive API to a folder in Google Docs.


Live gatherings present an opportunity for a different mode of editing because of the tremendous inefficiency of wasted, uncaptured thinking. A conference like Books in Browser is full – literally – of sharp, thoughtful people who travel great distances to focus their brains on a single topic. To harness some of this brainpower to improve the manuscript, we encouraged the attendees (including remote viewers following the live-stream video) to add comments, corrections, and feedback to each Google Doc slide-transcript. The comments are presented in the pane on the left and editors’ corrections were integrated instantaneously.


The final task was to capture a layer of marginalia in the pane on the right. We harvested the ambient and ephemeral twitter stream and rooted each tweet to the exact corresponding moment in the presentation itself. While this is the least deliberate form of creation/editing, it actually worked out well. We’re amazed how thoughtful and complex some of the tweets were, composed in the moment.

“What is this thing called?”

While of course we were disappointed that the demo didn’t quite work, enough people engaged with it that we can’t regret trying something a little out there. As we developed the idea, we found a lot of possible directions for further thought that all seemed interesting.

From what comes a book?
Defining what a book is has become a cliché of every publishing conference, but in this case we really did think about it. Considering every formal or informal talk an opportunity for deliberate authoring greatly expands our capability to create preserved narratives and “books.” This could be a conference, a business meeting, a storytelling session among friends and family, or the inside of a classroom.

The classroom, on- and offline
It’s likely that many, if not most, classrooms are going to be hybrid online and offline experiences. Online participation puts local and remote users on the same footing, and asynchronous commentary means that students who require more time to compose their thoughts get the benefit of “classroom participation.” Is copying down the instructor’s lecture the best use of a students’ attention? How can live transcription, plus peer editing, help students who can’t write quickly, are too easily distracted, or have gotten lost in the material?

Voice is coming
This experiment taught us that voice recognition is at a tipping point. Right now, it’s underutilized by software developers, game-makers, and content creators, but speech recognition (and text-to-speech) will soon be a transformative technology now that it’s become commoditized. Paired with inexpensive mobile technology, its potential reach in the developing world alone is staggering. What does “user interface” and “user experience” mean when voice may be an input or an output?

(While we disabled commenting in the Google Docs to preserve the experiment, we’d love to read further thoughts here.)