Analysis of SCO's Las Vegas Slide Show

Bruce Perens, Perens LLC <bruce@perens.com>
With help from Linus Torvalds and the Open Source community.

You may re-publish this material. You may excerpt it, reformat it and translate it as necessary for your presentation. You may not edit it to deliberately misrepresent my opinion. This information is published at the web site perens.com . If you are reaching it via another URL, be advised that I don't control that URL and have not corresponded with the owner.

An SCO presentation shown in Las Vegas on August 18th alleged infringement by the Linux developers. The presentation, in Microsoft PowerPoint format is here (GZIP compressed), or here (uncompressed), in PDF (Adobe Acrobat) at here, and a conversion of the presentation that can be viewed using a web browser is here . Thanks to WebFarmHosting.com and vanGennip.nl for offloading these files from my poor little DSL connection.

SCO released the presentation to Bob McMillan, a reporter for IDG News Service, without any non-disclosure terms. Bob asked me to comment upon it. here's his story.

I will start with SCO's demonstrations regarding "copied" software. It is likely that SCO would present the very best examples that they have of "copied" code in their slide show. But I was easily able to determine that of the two examples, one isn't SCO's property at all, and the other is used in Linux under a valid license. If this is the best SCO has to offer, they will lose.

Slide 15 purports to show "Obfuscated Copying" from Unix System V into Linux. SCO further obfuscated the code on this slide by switching it to a Greek font, but that was easily undone. It's entertaining that the SCO folks had no clue that the font-change could be so easily reversed. I'm glad they don't work on my computer security :-)

The code shown in this slide implements the Berkeley Packet Filter, internet firewall software often abbreviated as "BPF". SCO doesn't own BPF. It was created at the Lawrence Berkeley Laboratory with funding from the U.S. Government, and is itself derived from an older version called "enet", developed by Stanford and Carnegie-Mellon Universities. BPF was first deployed on the 4.3 BSD system produced by the University of California at Berkeley. SCO later copied the software into Unix System V.

The BPF source code is here on the Lab's web site. A paper on its design, published in 1993, is here

BPF is under the BSD license. That license allowed SCO to legally copy the code into Unix System V in 1996, but since SCO doesn't own the code, they have no right to prevent others from using it.

So, in this case the SCO "pattern-recognition" team correctly deduced that the Linux and SCO implementations of BPF were similar. But I was able to determine the origin of BPF after a few minutes of web searches on google.com . Why couldn't a "pattern-recognition team" do the same? It's difficult to believe they simply didn't bother to check. It's also likely that SCO dropped attribution of the Lab's copyright from the System V copy of the BPF source code, or the team would have known.

The Linux version of BPF is not an obfuscation of the BPF code. It is a clean-room re-implementation of BPF by Jay Schulist of the Linux developers, sharing none of the original source code, but carefully following the documentation of the Lab's product. The System V and Linux BPF versions shown in slide 15 implement the same virtual machine instruction set, which is used to filter (allow, reject, change, or reroute) internet packets. And the documentation for that VM even specifies field names. Thus Schulist's and the Lab's implementations appear similar. Had Schulist chosen to directly use the Lab's code, it still would have been legal. But the version in Linux is entirely original to the Linux developers. There is no legal theory that would give SCO any claim upon it.

Slides 10 through 14 show memory allocation functions from Unix System V, and their correspondence to very similar material in Linux. Some of this material was deliberately obfuscated by SCO, by the use of a Greek font. I've switched that text back to a normal font.

In this case, there was an error in the Linux developer's process (at SGI), and we lucked out that it wasn't worse. It turns out that we have a legal right to use the code in question, but it doesn't belong in Linux and has been removed.

These slides have several C syntax errors and would never compile. So, they don't quite represent any source code in Linux. But we've found the code they refer to. It is included in code copyrighed by AT&T and released as Open Source under the BSD license by Caldera, the company that now calls itself SCO. The Linux developers have a legal right to make use of the code under that license. No violation of SCO's copyright or trade secrets is taking place.

The oldest version of this code we've found so far is in Donald Knuth's The Art of Computer Programming, published in 1968. Knuth was probably working from earlier research papers. He didn't write in C, so details differ but the algorithm is the same. The implementation shown in the slides was written by Dennis M. Ritchie or Ken Thompson at AT&T, in 1973. You can see the 1973 version of the function in this file, originally called dmr/malloc.c. The code is from Unix version 3, the oldest known version of Unix that still exists in machine-readable form. The complete source for that system can be found here on the net. In 2002, Caldera released this code as Open Source, under this license. Caldera is, of course, the company that now calls itself SCO. The license very clearly permits the Linux developers to use the code in question. Historical information on why Caldera released the Unix source code to the public is here, and contains some information relevant to the SCO court cases.

In the early 1990s, AT&T's Unix Systems Labs (USL) sued BSDI, a company vending the BSD system, and the University of California, over this and other code in the University of California at Berkeley's BSD system. The claims that SCO is making are very similar to the AT&T claims. AT&T lost.

AT&T was actually found to have lost its copyright to the code in question during the lawsuit, because the code was published without a proper copyright notice. This would not be the case today, as there have been changes in copyright law and all work is copyrighted by default. But the judge's decision back then was:

Consequently, I find that Plaintiff has failed to demonstrate a likelihood that it can successfully defend its copyright in [Unix version] 32V. Plaintiff's claims of copyright violations are not a basis for injunctive relief.

The result is that between the judge's finding and 1996, when there were additional changes to the Berne copyright convention that would have made the AT&T code copyrightable, the code was essentially in the public domain. Code derived from Unix before and during that time would be legal.

It was also found that AT&T had copied heavily from the University without attribution, and thus AT&T settled the case. In the settlement, the University agreed to add an AT&T copyright notice to some files and to continue to distribute the entire system under the BSD license. AT&T agreed to pay the University's court costs. Some details of the lawsuit are here.

The AT&T code that was subject of this lawsuit survives into SCO's current system, and the version that was included in Linux seems to be from System V. That version differs from the public domain version by 2 lines - both concerned with diagnostics rather than working code. That trivial a difference doesn't appear to be copyrightable.

SCO's "pattern analysis team" found this code and correctly concluded that it was similar to code in Linux. But they didn't take the additional step of checking whether or not the code had been released for others to copy legally.

The code in question has already been removed from the most recent development versions of the Linux kernel, for technical reasons. It duplicated a function provided elsewhere, and thus never should have been included. The code was intended for one SGI system that was never sold, and another that is extremely rare, and was not used in the mainstream Linux kernel.

In slide 20, SCO alleges that it owns essentially all of the code in Linux that has been touched at all by IBM, SGI, and other Unix licensees. These contributions constitute over 1.1 Million lines of code, 1549 files, totalling 2/3 of the new code developed between the releases of Linux 2.2 and 2.4. But how could SCO possibly own all of this code that is copyrighted by other companies and individuals? SCO's legal theory, explained in slide 6, is that the AT&T Unix license compelled all of these companies to assign to AT&T, and later SCO, all derived works that they created incorporating the Unix source code. Here is the key clause on slide 6:

Such right to use includes the right to modify such SOFTWARE PRODUCT and to prepare derivative works based on such SOFTWARE PRODUCT, provided the resulting materials are treated hereunder as part of the original SOFTWARE PRODUCT.

Under SCO's theory, if any code created by a Unix licensee ever touches Unix, SCO owns that code from then on, and can deny its creator the right to make use of it for any other purpose.

SCO's legal theory fails, because they ignore the fact that if a work doesn't contain some portion of SCO's copyrighted code, it is not a derived work. This is especially glaring on slide 20, in which SCO claims ownership of JFS, IBM's Journaling File System. The version of JFS used in Linux was originally developed for the OS/2 operating system, and was later ported to Linux. It doesn't share code with the JFS implementation in System V. SCO's claims fail in a similar manner for the other products they mention: RCU or Read Copy Update, software that keeps processors in a multi-processor system from interfering with each other, was developed by Sequent, a company later purchased by IBM. Sequent developed RCU under Dynix, a Unix-derived operating system. They later removed RCU from Dynix - separating it from any code owned by SCO - and added it to Linux. Similarly, SGI's XFS, the eXtent FileSystem, was separated from IRIX, a Unix-derived operating system, and ported to Linux.

SCO's contention is that copyrighted software can never be separated, that any code created by a Unix licensee that ever touches SCO Unix or is even loosely based on Unix is entirely SCO's from that moment on, and can never be used for another purpose by its creator without authorization from SCO. SCO's contention goes against any reasonable understanding of the boundaries of copyright and trade-secret law. It's unlikely that it would survive a court room.

SCO's responses to this document are We own Unix and would know what it looks like, and It's his word against ours. I'm not, however, asking you to rely on my word. I've presented you with links to the evidence, most of which is available at web sites not under my control. Please examine it and make your own conclusion.

Bruce Perens

Analysis of SCO's Las Vegas Slide Show

Links