Why Verbs?

July 26, 2008

Now that I’ve blogged about Ubiquity, you should understand why I’ve been obsessing over the properties of a good linguistic UI. It’s not an academic problem: It’s one of the interfaces to the extension I’m working on right now!

Some commenters have asked me the question (if not in these exact words): Is a linguistic UI the right kind of UI for Ubiquity, and if so, why?

(”Because Jono is obsessed with linguistic UIs” isn’t a good enough reason.)

First, the really big picture of what Ubiquity is supposed to be all about: It’s a step towards a Web where verbs (i.e. functionality, i.e. commands, i.e. services) are first-class citizens. And that’s why I’m thinking it should be renamed from Ubiquity to something like “Mozilla Verbs”, maybe.

Creating and sharing nouns — i.e., web pages, i.e. content, i.e. data — on the Web has always been very easy. All you have to do is give someone a link to a URL, and they can see your content. The Web was designed around this idea from the very beginning. But the modern Web is not the relatively static library of information that was originally imagined. It’s full of pages that do stuff. Some of them do so much stuff that we don’t even call them “web pages” anymore, we call them “web applications”. The modern web is full of sites that exist to provide a service rather than a list of facts. You can google something, you can digg something, you can slashdot something… The modern web is full of verbs! The next generation of web interfaces will need to make sharing, creating, interlinking, and combining these verbs as easy as the hypertext paradigm made it to share, create, interlink, and combine nouns. Aza wrote a great post about this, called Sharing Streamable Functionality.

So, keeping in mind that that’s the goal, there’s a couple reasons why a linguistic UI could be better than a point-and-click UI; not for every use case, but for many of them.

The first reason is that a point-and-click UI requires every verb to be graphically represented as an icon or menu item. As the namespace of commands grows, it becomes hard to find places to put all those icons and menu items; the advanced stage of this disease results in terrifyingly bloated GUI apps like Microsoft Word. On the other hand, having zillions and zillions of commands is not a problem when you can simply type the one you want. (Provided, of course, that you know the one you want, which is why I’m so concerned with learnability.) “Zillions and zillions of verbs” is where I think we’re going, because of how easy Ubiquity makes it to create verbs and share them.

The second reason is expressiveness, as I defined it in my last post. I want to be able to tell Firefox, for instance:

“Hey Firefox? Select this page, translate it to Spanish, encrypt it with my mom’s public key, email it to her, hit send, and oh yeah save this chain of commands as a new command so I can use it later. Let’s call the new command ‘garblify’.”

That’s a complex idea. I could do all that in Firefox as it stands now, but I would have to switch between lots of tabs, lots of web applications, copying and pasting and clicking on buttons and icons left and right, and it would take dozens of individual steps. That’s because it’s inherently hard to express a complex idea through the medium of pointing and clicking. It’s much much easier to express a complex idea using language, as I did above. That’s what language is for. This is all provided, of course, that we have an input language which is sufficiently expressive to get the idea across, while not being insanely hard to learn.

On the subject of linguistic UI vs point-and-click, a commenter by the name of VioletJoker left the following comment on a previous post:

What a brilliant idea. Less GUI, more typing. In fact, the same thing applies to scripting languages - why all the clean abstractions, what the programmer really needs is more flexibility, so by extension, we should all develop in machine language. NOT

Despite the sarcasm, VioletJoker makes a really good point! Interfaces are bad if they ask you to make decisions on a level of detail you don’t care about.

For instance, when programming in assembly language, you have to think about the exact memory locations of data and instructions, the instruction set of your processor, and what register you’re loading stuff into. This is a drastically higher level of detail than you want for most problems. C lets you work on a higher level of abstraction, but you still have to think about memory allocation and deallocation. When you’re writing malloc() and free() you are doing the computer’s chores for it instead of focusing on your problem domain. Java lets you work on a higher level of abstraction than C, and Python lets you work on an even higher level than Java. I’m a huge fan of Python because, compared to other languages, there are very few decisions I have to make when writing Python that aren’t relevant to my problem domain.

In user-interface design, it’s the same thing! The first GUI was a step forward, not because there’s something inherently bad about typing, but because it let users work on a higher level of abstraction and forget about irrelevant details like “what’s the exact filesystem path to the directory I want?”. (Well, that and the fact that it had superior discoverability.)

But the GUI is far from perfect. I could fill a book with examples of places where Windows Vista (or Leopard, or Ubuntu) forces us to make decisions that aren’t relevant to what we’re trying to do. Even the Firefox GUI makes us think about fiddly GUI bits unrelated to our web-surfing task. Fiddly bits like which text input field has keyboard focus? Where on the screen is that other tab that has the page I want? Am I currently logged in to my webmail or not? If I hit the Enter key right now, will it submit a form? Etc.

So, when I talk about a linguistic UI, I want something that lets me forget that stuff. I want it to let me work on an even higher level of abstraction than the Firefox GUI. The email verb should let me shoot off a message to somebody just by specifying who they are and what I want to say to them. I don’t want to have to think about navigating to the page for my webmail, or think about which webmail service I’m using or whether I’m logged into it already or not. The email verb should invisibly handle those details for me as much as possible; it should make smart guesses about what I want, while allowing me to easily override it when it guesses wrong, and it should attempt to improve the accuracy of its guesses over time.

So to my list of requirements for a good linguistic UI, I’ll add one more: it should abstract away details that are not relevant to the task at hand. In other words, the vocabulary — the command set — should be on the level of tasks that the user cares about.

Tags: ,

17 Responses to “Why Verbs?”

  1. Ben Longoria Says:

    Who do you see as being a target audience or user of Mozilla Verbs? Firefox users in general, or more narrowly developer types?

  2. jonoscript Says:

    Ben: Good question. We’re hoping for cross-over appeal. The plan is that it will appeal to developers first because of how very, very easy it is to write a ubiquity command (Orders of magnitude easier than writing a Firefox extension — I’m going to do a full post about this soon).

    Then, because there are so many useful commands being written for it (fingers crossed), it will appeal to the same group of Firefox users who install extensions. (Which is far from all Firefox users, by the way).

    Then, (in my world-domination fantasies), due to popular demand it will be built-in to all future versions of Firefox. Which will then go up to 50% marketshare and millions of people will be using Verbs every day.

  3. Jordan Says:

    I think an easier way of fitting nouns and verbs into the web that exists now is comparing them to a static web and a dynamic web. With a static web there were web page links, you could read text … the nouns. The dynamic web is the action and interaction of AJAX, social, interactive, etc web that has been labeled Web 2.0.

    So I’d say that Verbs are a type of UI for interfacing with a dynamic web. I’m not convinced that they are the right UI for that task, but then again I haven’t tried it. (an XPI snapshot would be nice)

  4. Icon Says:

    wow so cool.

  5. Mozilla Labs » Blog Archive » Introducing Ubiquity Says:

    [...] Why Verbs by Jono DiCarlo [...]

  6. Dave S Says:

    Brilliant concept, I love the idea. Would it still, however, require a keyboard as a means of input? While I believe the keyboard will be around for a long time to come, I think there will also be major advances in “physical” based input (touch screens, manipulating devices akin to Wii, etc.). I think there will also be major advances in how mobile devices interact with the web & greater demand for content on the go.

    So, that said, how will Verbs adapt/respond to the challenge of a keyboard-less environment / an environment where keystrokes are a pain?

  7. John Says:

    But when do you get around VioletJoker’s main complaint about just adding more typing?

    For example, it seems to me that Ubiquity is doing things with the keyboard that I can already do with my mouse, using OS X Services, for example. Why should I type “define this” when I can click on it and choose the dictionary service?

    Unless we’re talking about a spoken-language approach - which is what your example of “telling” Firefox really looks like, I don’t see this as a big step forward.

  8. Juan Romero Says:

    So, in essence, will Ubiquity be what Quicksilver for the Mac tried/tries to be?

  9. F. Heinsen Says:

    Cool. Have you guys thought of incorporating social features, e.g., like those of yubnug.org (an older, arguably less ambitious effort which has some commonality with your project)? Also, have you thought about implementing the equivalent of *ix pipes on this?

  10. Introducing Ubiquity | about ICT Says:

    [...] Why Verbs by Jono DiCarlo [...]

  11. Anna Says:

    To answer John:
    I love ubiquity already and I’ve only been using it today. For someone like me who is keyboard oriented, it’s great - I don’t have to leave the keyboard to find the stuff I wanted. I dont’ have to “click on it” - I keep my fingers on the keyboard where they belong. It’s much more accurate and fast, for some of us, to type rather than mouse. Mouses are great, don’t get me wrong. But there are times when keyboard is just more efficient and Ubiquity takes advantage of that.

  12. Gilcatt Says:

    As long as we can get rid of unecessary “clicking” to get things done, I’m in… verbs are natural. The mouseclick is used to validate a choice, from a binary YES-NO alternative. In essence that’s what a mouseclick “knows” about language: YES & NO.

    Complex situations require the user to go through large strings of “YES-NO” prior to getting to the final YES that we’re looking for.
    Getting rid of such large strings of mouseclicks is an obvious idea.

    You don’t want to rebuild your house brick by brick everytime you want to go home. Yet that’s what we all do today when we want something specific on the Web: click by click, again and again.

  13. Dan Lewis James Says:

    I would like to add more about the subject of language as a barrier. If you check out the work of linguists such as Eleanor Rosch, George Lakoff and even Wittgenstein it is easy to see how usability can be improved if more thought is given to the naming conventions we use.

    The results of many years of linguistic study have shown that human kind across many different cultures name things most readily at the genus level, the mind must find it easier to interpret content if the genus is used when designing with nouns. With verbs however I think we find a move beyond this and towards a bodily movement which seems to me to be a more user friendly approach.

  14. Feel Our Ubiquity Baby « Meme Shift Says:

    [...] some prosumer, pursuing his or her own interests, takes the time to create the “garbilify” command and shares it with the world, I will not need to know about it in order to use it. I will only have [...]

  15. yonkeltron » Mozilla rolls out a browser command line called Ubiquity Says:

    [...] choice and get weather. The video shows some very impressive functionality and gives a taste of the natural language functionality that they seem to have in mind. The review over at Ars Technica has an excellent overview of [...]

  16. Tim Says:

    Ubiquity can’t be a step backwards, because in an interface the mode of input is not as important as the language of input. Whether it’s a command line or a menus-and-buttons GUI or a mouse gesture system, the input methods have no inherent superiority. What matters is the sophistication of the dialogue*.

    A gesture system is very powerful because it is fast, intuitive and abstract. However, it is unsophisticated; more complex actions require more complex gestures or longer chains of simple gestures, and as the complexity grows, the speed advantage is lost and it is harder to recall commands. It is useful only for very simple commands (but very, very good at that).

    Mouse-driven GUIs are also powerful, but slow, and limited in the sense that commands must be located, their locations remembered, and long sequences repeated in their entirety. GUIs are controlled more by their designer than by the user. Many GUIs offer macro action systems that compensate for these shortcomings, but most people are unfamiliar with them.

    Command lines are the most versatile of interfaces because of the flexibility and speed of language. What Ubiquity offers is a more sophisticated dialogue; if you can write English, you can use Ubiquity. If Ubiquity is fluent in English - in verbs - then you can use language in all its nuanced power.

    It is strange to see people suggest that command lines are a step backwards - to see them commenting on blogs and forums to that effect. After all, they are using their keyboards to type into text fields, are they not? It is no surprise that instant messaging is more popular than voice chat, that text messaging is more popular than telephoning, that emoticons :) have become a shorthand for emotion. Language is how you Get Things Done in the world; it’s what we use to navigate through conversations, business agreements, relationships. We should be able to use it to talk to our computers.

    Of course, at this stage, Ubiquity doesn’t really show off the power of verbs. Many of its features are hardly faster than what Firefox can already do in the Awesomebar. However, as its ability to understand our commands grows, the sophistication of our dialogue will rise, and someday, typing, “Send Bob Jones map directions from his house to mine, tell him ‘Hope to see you at the kegger this Friday’, and attach a random ‘beer goggles’ photo” won’t just be a cool concept.

    (*I say “dialogue” because that’s what it is: the user inputs, and the computer outputs, but the input/output is the same thing: new information added to a conversation.)

  17. Mozilla Ubiquity project Says:

    [...] discovered this project after reading Jono DiCarlo’s blog post about linguistic UIs, in it he was discussing the difference between using a noun as a connection for a user and using a [...]

Leave a Reply