Older blog entries for cinamod (starting at number 78)

28 Oct 2005 (updated 28 Oct 2005 at 03:14 UTC) »
Luis, the problem with that is that we already have at least 2 "standards" for Word Processing formats already - DOC and RTF. I don't see what supporting another standard buys us. The great thing about standards is that everyone has one...

Microsoft and Apple already support RTF in the core OS (and have for a while. Apple's Cocoa framework supports DOC too). All the existing WPs support one if not both of those formats. We already have our interchange formats. DOC may be hard to emit and consume, but RTF is no better or worse than ODF. RTF already is the Esperanto (or lingua-franca, depending on your viewpoint) of the Word Processing world. If we all started speaking ODF instead of RTF tomorrow, pragmatically, nothing really changes.

That's because there already is a standard around which all the WPs measure themselves. And for better or worse, that standard is Microsoft Word. And that standard's file format is RTF/DOC. The WPs of the world are largely competing on quality and not feature creep, built around those standards. In this mindset, all ODF amounts to is a few thousand more lines of code that you'll need to write and support. But having your potential innovations constrained by a large, unresponsive standards body sounds less-than-ideal. As someone who voted "YES" on the recent referendum, I think that you'd show a little sympathy there :) I don't think that we've reached the be-all and end-all of WPs, and any such standard would have to be aware of that.

That said, file formats are boring, just like TCP is. You'll eventually reach some "good enough" state, like RTF or ODF. Like TCP, the cool things are the apps built on top of the formats. Any app that can emit ODF can emit RTF just as easily. Why more don't, I honestly don't know. As more people start using it, I'll continue consume it as happily as I consume DOC and RTF today - they're not going away anytime soon. But I do know that I'd be touting OOo's near feature-parity with Word a lot louder and ODF's importance as a format a lot less if I were Sun. But that's just me.

27 Oct 2005 (updated 27 Oct 2005 at 03:41 UTC) »
Cowon iAudio

Just got my iAudio x5 30GB "MP3" player in the mail. The sound quality, the video quality, the size, the codecs it supports. This device is simply amazing. If anyone is in the market for a new player, I highly recommend it.

Gimp and the GtkFileChooser

Burgundavia , the Gimp and Inkscape use the standard GTK+ filechooser widget. The widget can be extended to have a preview. What the Gimp doesn't do is use GnomeVfs, so only your local file system is browsable.

librsvg

So Caleb is some sort of coding machine. In the matter of a few days, we've (read: mostly Caleb) (mostly) finished the Cairo backend for librsvg. In honor of this work, I've just released version 2.13.0 (may take a few minutes for the FTP mirrors to sync).

What's the big deal? Well:

  • It doesn't use libart.
  • It can generate more formats than just PNG - drawing to PDF, X11, Win32, Quartz, PS should all be possible now.
  • More conformant output - with the exception of some CSS and text, we nearly pass the W3C SVG 1.1 conformance test. We even beat Batik's conformance for a few tests...
  • It is significantly faster than its libart-based counterpart - on some tests, it's 6x faster than the libart backend. Using pre-multiplied RGBA clearly has some performance penalties.
  • It has the beginnings of a DOM API.
  • Did I mention that it doesn't use libart?

The library is stable - it can handle our own test suite, the W3C suite, and Batik's suite without incident. Only in 1 of the 182 W3C conformance tests did I notice any appreciable memory leakage directly responsible by librsvg (which we'll be correcting shortly). All told, this is one rockin' release. Carl, Caleb, Chris - you all rock. Thanks.

I will warn consumers of the library that the API and ABI have changed to accomodate this work. It's mostly backward-compatible, but there are some changes to be aware of. I expect the API and ABI to change further during the release cycle as things get ironed out and the DOM API formalizes. Caveat emptor.

16 Oct 2005 (updated 16 Oct 2005 at 17:47 UTC) »
Reverse Outsourcing?

It's been an odd week. I've had 3 Indians contact me, repeatedly, via email, asking me to do their work for them.

  • The first was a MSCE who works for Intel. This guy basically couldn't figure out how his compiler works. He doesn't know anything about building packages from source and wants bugfixes for wvWare that are available only in its recent 1.2.0 release. His machine runs (as best I can tell) a 4+-year-old Linux distro and he's complaining about glib-2.0 dependencies. I've been as patient and helpful as I can be (repeating things in the README and INSTALL files before telling him to RTFM), but don't I know that his project for Intel was due yesterday? Does no one at Intel know about ./configure && make?
  • The second was a student in Bangalore who "demands" that I do his CS homework for him. I pointed this guy to some relevant literature for his project (texts summarization, like I worked on for OTS) and politely told him that he would only learn by doing the work himself. Outrage, I tell you! Didn't I know his homework was due?
  • The third was a guy who "demands" that I export Microsoft Word art as LaTeX from wvWare immediately. I told this guy that we currently convert the art to PNGs and that converting it to LaTeX would be a lot of work that I don't feel like doing. Heresy!

These guys have all been extremely polite, but forceful, persistent, and presumptuous. What's in it for me? Intel isn't going to fire me and I won't be kicked out of school due to bad grades. Like I said, it's been an odd week...

Federico,

Maybe the 0.3s Fontconfig startup penalty will be a thing of the past with its new devel version. Patrick Lam has been optimizing FC in order to reduce memory usage and startup penalties related to computing glyph metrics.

1 Oct 2005 (updated 1 Oct 2005 at 22:42 UTC) »
10 gallons of basil

So, in mid June, Ruth and I bought two small basil plants from a local nursery. Soon, the plants started to get big, so we split them into 4. Not long after that, the plants became a hedge of basil and overtook our back deck.

So today, we harvested the plants and filled a 13 gallon trash bag almost to the brim - about 38 liters of basil, all told.

So, I'm looking for suggestions, besides the obvious pesto and pasta. What should we do with it? If you're in Cambridge, MA, a perfectly good answer might be "give me some" :-) (/me looks in Luis', Krissa's and Bryan's direction).

librsvg 2.12.3

Thanks, Philip. A new release is out that should fix the problems you've run into.

I'm amused at Michael Meeks' latest interview on OpenOffice performance and startup time.

I'd like to preface this by saying that there are real problems with g++, kernel I/O schedulers, and the like, and I'm glad that someone like Michael is looking into them. I have little doubt that OOo will be speedy enough for many people's uses soon.

Sure, kernels aren't the greatest at knowing what resources you'll use next - see RedHat's attempts to speed up the Linux distro boot time and RML's "disk seeks are evil" paper. And, sure, g++ isn't the greatest C++ compiler on the planet, though it's getting better lately, in part due to Michael's efforts.

But at some point, comments like Michael's start looking like an exercise is passing the buck, and it's hackneyed and tiring by now. Ok, comparing OOo's and vi's load times isn't exactly a fair fight. But what about when Microsoft Office loads in Crossover Office faster than a native copy of OOo does? Is that a fairer comparison?

At some point (and after something like 4 or 5 years of optimization work), some of the blame simply must amount to bad design, bad implementation, or both. Copping out and saying that "it's complex" doesn't cut it. Lots of things are high-performance, complex, written in C++, run on Linux, and hit the disk. But these things tend not to require a JVM, VB interpreter, thousands of C++ vtables, hundreds of images from on-disk, and its own CORBA-like component system in place before the first window gets rendered to screen.

From the outside looking in, it looks like something is unecessarily complex, fundamentally broken, or else not well thought-through. From the outside looking in, it looks like you're optimizing the edges. And while it may be simpler to do that than re-organize OOo's behemoth codebase, I can't say that I'm encouraged. The "code first, optimize later" philosophy doesn't scale up well to projects of OOo's size.

Unecessary complexity does create itself from nothing when there's not constant vigelence. And it's time for the OOo group to look in the mirror and shoulder at least some of the blame, or at least stop passing the buck. I'm tired of hearing about it.

Tor,

You can think of copyright as a certain kind of social contract between the creator of a work and society at large.

Copyright doesn't view an author's control over his/her work as absolute. In the US at least, Copyright is codified in the Constitution as a necessary evil - something that Congress is entitled to grant authors in order to promote science and the useful arts.

Copyright isn't all-encompassing. It expires. Some things aren't copyrightable. Other societal rights trump your rights as a copyright holder. You can't stop someone from writing a news report about your book, or writing a scholarly essay about it. Until the DMCA, you couldn't stop people from making a backup copy of a work. Even with the DMCA in place, you still might not be able to stop them. In short, there are several "fair use rights" that the copyright holder simply can't deny you, no matter how much he or she wishes to. It's simply not in society's interests for them to do so.

The GPL doesn't interfere with any of these fair use rights, in fact it encourages something much stronger than fair use. It says, "Please, embrace and extend this work." With traditional licenses, only a narrow amount of "embracing and extending" is permitted under fair use rights. The GPL serves to limit your rights as an author, where traditional licenses generally seek to maximize them. You're always free to give up your own rights, but you may not force others to give up theirs. So your analogy with the GPL's enforcability is a poor one.

Certainly under today's laws, authors of a work have the right to license their work under any contract they see fit. But then I don't think (from a moral standpoint) they're entitled to the protections Copyright would otherwise provide, because they've encumbered reasonable societal benefits and rights. Copyright holders simply aren't entitled to do whatever they see fit in order to "protect" their works, because the laws are set up to protect society.

22 Sep 2005 (updated 22 Sep 2005 at 04:28 UTC) »

This whole Google print lawsuit bungle has got me thinking.

I'm confused by those who think that Google isn't unambiguously in the clear here. Not because they're doing it for a scholarly purpose. Or because they're only reproducing a terse portion of the work. Or not even because what Google is doing can't possibly affect these author's past decisions to create their works, and thus retroactively disincentivize their respective work's creation (ahem, Sonny Bono CTEA, I'm looking at you here...).

I think that if you're arguing those points, perhaps you're looking at this from an overly-narrow perspective. I invite you to look outside the box. Sure, Google might (or might not) win on the above points alone. But I don't think that Google is copying expressive works. Google is copying databases of words.

In the case of Rural v. Feist, the Court ruled that databases (in that case, telephone directories) were not entitled to copyright protection, as they contained little (if any) expressive content. Copyright protects expression fixed in a tangible media, and even then only within certain limitations.

Here, I believe that Google is treating otherwise expressive, copyrighted texts as databases, thus stripping them of their expressivity in the context of the texts' uses. I think that the use of a derivitive work matters a great deal in determining that work's expressivity before the Court. That the use of a work has a transformative effect on the expressivity of that work, possibly even voiding that work's expressiveness in a given context. In Google's case, the works are copied - perhaps verbatim - but their expressiveness is lost in the process. Granted, this may seem non-obvious.

The search results page rendered by Google most likely have some expressiveness, and would be copyrightable. The texts that Google OCR'd are expressive and copyrightable. But Google's treatment of these texts as search indexes - reverse text lookup databases - is in itself not expressive. They're just unexpressive token sequences, capable of being searched. It is in the translation from meaningful, expressive words into an ordered sequence of cold, machine-searchable tokens that the work loses its expressivity. Note that this distinction would still attach copyright protection to things like eBooks, as the purpose of eBooks is to convey expressivity to a human reader via an electronic medium. The purpose of the tokens is to convey an ordered sequence of words to a machine algorithm incapable of appreciating the work's expressivity or content in any way that we'd call "meaningful".

From that, we're left to conclude that the tokens "John Galt" appearing on page 1 of Ayn Rand's "Atlas Shrugged" next to the tokens "Who is" is merely a fact, absent any inherent meaningful expressivity. And absent this expressivity, copyright doesn't attach to this sentence (which, fwiw, is probably considerably too short for copyright to attach to, anyway). Facts - even collections of facts - simply aren't protected under copyright law.

In the end, Google's Print project is just a fact retrieval system - in essence, no different from the index in the back of the book that they're OCR'ing. Copyright law needn't get involved, because at no point does it affix to what Google is doing. Or so I hope that the Courts decide.

69 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!