Cross-Referencing Linux

Browse the code

News

(This section is really rather misnamed.)

2005-01-16

LXR has changed hosting providers. We'd like to thank Dept. of Mathematics, University of Oslo for their longstanding support of LXR and great service. Hosting will from now on be provided by Linpro AS.

01082001

Development has now moved to sourceforge. See the development section below for more information.

Motivation

The Linux Cross-Reference project is the testbed application of a general hypertext cross-referencing tool. (Or the other way around.)

The main goal of the project is to create a versatile cross-referencing tool for relatively large code repositories. The project is based on stock web technology, so the codeview client may be chosen from the full range of available web browsers. On the server side, the prototype implementation is based on an Apache web server, but any Unix-based web server with cgi-script capability should do nicely. (The prototype implementaion is running on a dual Pentium Pro Linux box.)

The main feature of the indexer is of course the ability to jump easily to the declaration of any global identifier. Indeed, even all references to global identifiers are indexed. Quick access to function declarations, data (type) definitions and preprocessor macros makes code browsing just that tad more convenient. At-a-glance overview of e.g. which code areas that will be affected by changing a function or type definition should also come in useful during development and debugging.

Other bits of hypertextual sugar, such as e-mail and include file links, are provided as well, but is on the whole, well, sugar. Some minimal visual markup is also done. (Style sheets are considered as a way to do this in the future.)

Technicalities

The index generator is written in Perl and relies heavily on Perl's regular expression facilities. The algorithm used is very brute force and extremely sloppy. The rationale behind the sloppiness is that too little information renders the database useless, while too much information simply means the users have to think and navigate at the same time.

The Linux source code, with which the project has initially been linked, presents the indexer with some very tough obstacles. Specifically, the heavy use of preprocessor macros makes the parsing a virtual nightmare. We want to index the information in the preprocessor directives as well as the actual C code, so we have to parse both at once, which leads to no end of trouble. (Strict parsing is right out.) Still, we're pretty satisfied with what the indexer manages to get out of it.

There's also the question of actually broken code. We want to reasonably index all code portions, even if some of it is not entirely syntactically valid. This is another reason for the sloppiness.

There are obviously disadvantages to this approach. No scope checking is done, and the most annoying effect of this is mistaking local identifers for references to global ones with the same name. This particular problem (and others) can only be solved by doing (almost) full parsing. The feasibility of combining this with the fuzzy way indexing is currently done is being looked into.

An identifier is a macro, typedef, struct, enum, union, function, function prototype or variable. For the Linux source code between 50000 and 60000 identifiers are collected. The individual files of the sourcecode are formatted on the fly and presented with clickable identifiers.

It is possible to search among the identifiers and the entire kernel source text. The freetext search is implemented using Glimpse, so all the capabilities of Glimpse are available. Especially the regular expression search capabilities are useful.

Availiablility

The sourcecode for the LXR engine is of course availiable. It is released under the GNU Copyleft license. Version 0.3 can now be downloaded. You can use it to index your own projects. Version 0.3 includes C++ support and a much nicer diff markup than before. Please tell us if you have trouble with the installation. Also, be aware that the documentation is still rather incomplete. Jim Greer has been kind enough to write some more comprehensive installation instructions. If you have trouble look at his installation instructions.

Development

The development of lxr has now moved to sourceforge. The new development page has more information.

The mailinglists on sourceforge are now official. We urge all subscribers of the old list to subscribe to the sourceforge lists.

Other Applications

LXR is written in a modular way, and our goal was to make it possible to use it for indexing any project. Take a look at the Mozilla Cross Reference and the FreeBSD Cross Reference.

ToDo

Oh, lots. Smarter parsing and better generalization, for starters. Doing something fun with CVS is on the todo-list as well. And of course better documentation.

Acknowledgements

Thanks to:

the Department of Mathematics at the University of Oslo for letting us run LXR on their machines.
Troll Tech for the DNS namespace
Larry Ewing for the penguin.

Resources

Other resources the authors have found useful:

Contacting the authors

We would very much like to receive feedback on this project. If you find it useful or have suggestions on how to make improvements, feel free to send us e-mail. We hope that this will be a useful tool, both for experienced developers and beginners wanting to explore the Linux sourcecode.

If you are interested in general discussions with other lxr users feel free to subscribe to the mailinglists at Sourceforge.

Arne Georg Gleditsch and Per Kristian Gjermshus <lxr@linux.no>

Cross-Referencing Linux Browse the code