objfs (Community Group observability.objfs)

The System Object Filesystem

The system object filesystem (objfs), is mounted at /system/object on every Solaris system. Its purpose is to expose the symbol table and CTF data for all objects currently loaded in the kernel, with the primary consumer being libdtrace. Why is this so important for DTrace? to quote from the original ARC case:

The dtrace_kernel privilege was created to allow uprivileged users to trace
kernel activity with DTrace. Unfortunately, DTrace still has to use /dev/kmem
to get type and symbol information. This has the side effect of requiring all
privileges, so a user with the dtrace_kernel privilege cannot use D scripts that
rely on CTF data or kernel symbols. This also prevents a 32-bit DTrace consumer
from running on a 64-bit system, and is one of the barriers to DTrace in a local
zone (since we can't include /dev/kmem in a local zone for obvious reasons).

Now that DTrace uses /system/object, the dtrace_kernel privilege actually behaves like it's supposed to, and you have the added benefit of being able to run your favorite ELF tools (nm(1), elfdump(1), etc) on the resulting files.

Finding the source

The source is located at usr/src/uts/common/fs/objfs. The following table provides a brief description of each file:

File	Description
usr/src/uts/common/fs/objfs/objfs_vfs.c	VFS and kernel module interfaces
usr/src/uts/common/fs/objfs/objfs_root.c	The root directory
usr/src/uts/common/fs/objfs/objfs_odir.c	Per-object directory
usr/src/uts/common/fs/objfs/objfs_data.c	The 'object' data file
usr/src/uts/common/fs/objfs/objfs_common.c	Miscellaneous shared routines

Understanding the source

The object filesystem is built on top of gfs, a framework for constructing pseudo filesystems. See David Powell's blog post for a description of GFS, why it was developed, and how it's used.

At the heart of objfs is the 'object' data file. The rest of the source code exists solely to present this data in the appropriate manner. The first question you might ask is: why have '/system/object/module/object', with nothing else in the directory? Isn't that redundant? The reason is simple extensibility. When /proc was first developed, there was a single file for each process, which quickly wore out its welcome. We think there may be other uses for this filesystem out there, and we don't want to rule any future enhancements.

The choice of ELF files follows a similar trend. The primary reason is that the existing tools (most notably libctf) already have existing interfaces to interact with ELF files. On top of that, the ELF format is by nature extensible, so we can add information within the ELF object file, in addition to creating new entries in the per-object directory. You might notice there is no text or data associated with the object file. The data
is self explanatory: exposing it is a clear security risk. The choice to export text is more subtle. But module text is modified after being loaded, so if we chose to export it would be revealing privileged information (such as which DTrace probes were enabled).

The source code is laid out around the data_sections table, which describes how to construct each section. The seemingly endless set of macros expand into a structure that the sect_*() family of functions can then access. The bulk processing is done in objfs_data_read(). We check the following in order: the ELF header, the section headers, and finally the section data itself.

Using the filesystem

The /system/object interface is not public, nor is it a stable interface. If you find yourself needing to use the filesystem, let us know so that we can determine whether it makes sense to stabilize the interface. The best ways to examine the data is through standard libraries, such as libelf(3LIB) or libctf.

Tags:

Created by admin on 2009/10/26 12:08

Last modified by ptribble on 2009/10/29 20:19

OpenSolaris

The System Object Filesystem

Finding the source

Understanding the source

Using the filesystem

Search

Collectives

Community Group

Project

User Group

Subsites

Community Group observability Pages