Gentoo Logo
Gentoo Logo Side

Content:

1. GLEP Header

GLEP: pending
Title: Portage MacOS/BSD enhancements: stage 1
Version $Revision: 1.2 $
Last-Modified: $Date: 2003/07/30 18:28:10 $
Author: Daniel Robbins <drobbins@gentoo.org>
Status: Draft
Content-Type: text/xml
Created: 07-Jul-2003
Post-History: 07-Jul-2003

2. Portage MacOS/BSD enhancements: stage 1

Proposal goals 

The goal of this proposal is to outline changes to be made to Portage to conveniently and seamlessly support MacOS and BSD operating systems.

Currently (as of 07 Jul 2003,) Portage does run on MacOS and BSD. However, it is missing convenience and maintainability features, as well as accepted guidelines and standards for maintaining ebuilds that run seamlessly on a variety of platforms. This document outlines enhancements that will allow Gentoo to begin to officially support MacOS and BSD. It covers all known relevant areas except cross-platform dependency coherency issues and "augmented environment" environment handling, which will be addressed in a future proposal.

Profile-specific compression settings 

Portage currently uses gzip compression for man pages, GNU info documents, and auxillary documentation. This approach is not compatible with MacOS X, where man pages are expected to be uncompressed. It also forces a specific solution upon the user, which is contrary to the Gentoo philosophy. In addition, it unnecessarily increases the size of .tbz2 binary packages by causing man pages, GNU info pages, and auxillary documentation to be compressed once with gzip, and then re-compressed with bzip2 as part of the package creation process, which is inefficient.

To address these issues, man page, GNU info documentation, and auxilliary documentation compression should be configurable on a per-profile basis, with allowances for local and environment-based overrides of default settings. To do this, several new variables can be added to Portage to control compression settings, that could work as follows:

Code Listing 2.1

e_compress="man:none info:gzip doc:bzip2"
e_compress_doc="\.html \.txt \.ME"

Above, the e_compress setting tells Portage that man pages should not be compressed, info pages should be gzip compressed, and documentation should be bzip2 compressed. The e_compress_doc allows one to specify what files are considered to be documentation by including regular expressions which are then matched against the basenames of all documentation files. A match means that the document should be compressed using bzip2 compression, as specified in the e_compress variable.

Here is how this functionality should be implemented. After the ebuild install phase completes, a new phase will be run called prep. During this phase, all man, GNU info, documentation, and potentially other future categories of files will be ensured to be in an uncompressed format. Any files that are found to be compressed will be automatically uncompressed.

Then, the merge phase will be modified so that before files are copied to the native filesystem, the local e_compress and e_compress_doc settings will be applied to the appropriate files in the image/ directory. After this process completes, these files will be moved into place and recorded in the Portage package database.

This solution will be transparent for .tbz2 binary packages. Using this implementation, man pages, GNU info documents, and documentation will be stored in binary .tbz2 archives in an uncompressed format, allowing the files to be compressed efficiently due to the single pass of bzip2 compression applied to the package data. When a user installs a binary .tbz2 package on their local system, the documentation, man pages and GNU info documents will then be selectively compressed based on the user's profile and/or local settings. This will allow our ebuilds to adapt to the needs and preferences of specific profiles and/or users.

Note: The e_compress and e_compress_doc variables use a new suggested Portage configuration file variable convention. The variable names themselves are all lower-case. The e_ prefix is used for general control variables of any kind that are not fully-qualified paths. Under this new convention, the p_ prefix is intended to be used for variables that are fully-qualified paths.

New "macos" keyword 

To properly support MacOS X, a new keyword is needed for use in the ACCEPT_KEYWORDS, KEYWORDS and dependency variables, as well as any ARCH tests. This selected keyword should be used as a basis for the names of all relevant infrastructure, such as IRC channels, Web page document filenames, archive names, and mailing lists. This will help to avoid confusion within the Gentoo community.

The best short keyword to use to refer to Gentoo running on MacOS X is macos. osx isn't appropriate because it refers to the operating system version, but not its official name. darwin is not appropriate because the effort to support Gentoo running on MacOS X (a commercial non-free operating system, like Solaris) is fundamentally different from the effort to get Gentoo running on Darwin (a non-commercial free operating system, like NetBSD.) mac is not appropriate because it refers to Apple hardware (which could be running Linux, MacOS X, OpenDarwin or something else) rather than the operating system itself. By using the macos keyword, we can refer to the effort to support Gentoo on MacOS X and future versions of MacOS specifically. We can thus make MacOS-specific masking and dependency decisions, and have IRC channels and mailing lists devoted to MacOS-specific issues. This would not be possible if we were to use a darwin or mac keyword. Because of the selection of the macos keyword, the official project name for the "Gentoo on MacOS" effort should be "Gentoo/MacOS." This provides an easily-understood definition for the "Gentoo on MacOS" effort.

Seamless support of variant filesystem hierarchies 

Note: Per Grant Goodyear's comment, we should ensure that pathspec handles not only platform-specific variations in paths, but also application and version-specific variations in paths, particularly in relation to .tbz2 consistency issues and finding a replacement for has_version and best_version in ebuild.sh.

Supporting MacOS X raises two challenges in relation to supporting variant paths in Gentoo. For one, MacOS X uses a BSD-like filesystem hierarchy, with some Apple-specific extensions. Unlike Gentoo Linux, MacOS X is not FHS compliant. In addition, Gentoo/MacOS is an "augmentation" of an existing commercial operating system. Unlike Gentoo Linux, Gentoo/MacOS needs to co-exist with an existing filesystem tree. To fit in with existing MacOS X conventions, it is recommended that Gentoo/MacOS packages install into the /opt/gentoo tree.

This raises several challenges: first, how should Portage adjust to path structures that are only somewhat similar to the FHS standard? Second, how should Portage adjust to path structures that may reside within a sub-tree like /opt/gentoo? Third, how do we add such flexibility without peppering ebuilds with hard-coded platform-specific paths amid conditional statements, which would cause our ebuilds to become unmaintainable?

A general solution is needed to address these issues. This section documents such a solution, called "pathspec."

Pathspec has been designed to support the self-similar nature of filesystem trees in an elegant way. The term "self-similar" is borrowed from mathematics, where it is often used in relation to fractals. Wolfgang E. Lorenz provides an excellent definition of self-similarity in Fractals and Fractal Architecture:

"Fractals are always self-similar, at least in some general sense - what does that mean? That means that on analysis of a certain structure will bring up the same basic elements on different scales. For example, details of a certain coastline look like larger parts of the whole curve; the characteristic - the irregularity - of this natural form remains the same from scale to scale. In this way fractals can also be described in terms of a hierarchy of self-similar components - e.g. trees and branches or town-, district- and local-centers."

By recognizing that filesystem trees have self-similar characteristics, we gain an advantage in documenting their structure. Rather than documenting every detail of a filesystem, we can document the irregularity, and then specify how this irregularity manifests itself on different levels of the filesystem tree.

For example, on a FHS system, the /usr, /usr/local and /opt trees are very similar. In fact, if we document the structure of /usr, then we have also documented the structure of /usr/local. If we document the structure of /usr/local, then we are not far away from having a definition of the structure of /usr/X11R6. All these sub-tree have a very similar internal structure.

Pathspec is efficient because it does not require sub-tree definitions to be repeated unnecessarily. Similar structures can be defined based on already-defined structures by taking advantage of OOP concepts.

Note: This part isn't done yet.

Intra-tree cross-platform compatibility 

As of June 2003, Portage itself (ebuild, emerge, etc.) can generally run under MacOS X and BSD. However, while Portage itself can run on these platforms, some ebuilds and eclasses currently contain Linux-specific conventions, particularly in how auxilliary programs like xargs, find and tar are called. These variations can cause an ebuild to execute correctly in a GNU environment but not in a BSD environment, or vice-versa.

The general strategy to address these issues should be as follows. First, an emphasis should be placed on writing shell code that is truly cross-platform in nature. Second, when there is no suitable cross-platform code, Portage should provide a general framework to allow ebuilds to easily adapt to situations to where variant calls are needed. Here is how Portage addresses the situation currently. This is code from version 1.326 of portage.py:

Code Listing 2.2

ostype=os.uname()[0]
if ostype=="Linux":
        userland="GNU"
        import missingos
        lchown=missingos.lchown
        os.environ["XARGS"]="xargs -r"
elif ostype=="Darwin":
        userland="BSD"
        lchown=os.chown
        os.environ["XARGS"]="xargs"     
else:
        print "Operating system \""+ostype+"\" currently unsupported.  Exiting." 
        sys.exit(1)
        
os.environ["USERLAND"]=userland

This code ensures that all ebuilds have a USERLAND and XARGS variable defined in their environment. THe USERLAND variable allows ebuilds to adapt to situations where two different types of calls are needed for each userland environment. The XARGS variable allows ebuilds to call $XARGS rather than their native xargs command. By default, BSD xargs will not execute the specified command if an empty list is provided, but GNU xargs will. GNU xargs requires an -r option to mimic the generally preferrable BSD behavior, while under BSD, the -r option is not recognized and produces an error condition. The proper cross-platform way to call xargs is as follows:

Code Listing 2.3

cat foo | $XARGS ls

Here are some suggested improvements for the current system described in this section. First, it would be best to move to the new Portage variable naming convention, and rename variables from USERLAND to e_userland and XARGS to e_xargs respectively. This will help to avoid potential namespace clashes with Makefiles and other build scripts. In addition, the above code can be modified to use the Python variable sys.platform to determine the current operating system platform. Third, this system can be extended to support new cross-platform executable calling conventions (such as xargs) as they are identified.

As our cross-platform support evolves, it can be expanded to support cross-platform user and group creation (using either adduser, netinfo or something else,) as well as other cross-platform issues. By starting this cross-platform effort, we are beginning the process of standardizing Portage interfaces and conventions so that can work seamlessly on a variety of platforms. We can expect this trend to continue in other areas of Portage as well.

This solution will help ensure that inter-platform compatibility issues are addressed in a consistent and maintainable fashion, and allow ebuilds to begin to be made BSD-userland compliant.



line
Updated 7 Jul 2003
line
Daniel Robbins
Author

line
Summary: Portage changes necessary for full MacOS/BSD environment support
line

Donate to support our development efforts.

line
The Gentoo Linux Store
line
php|architect

php|architect is the monthly magazine for PHP professionals, available worldwide in print and electronic format. A percentage of all the sales will be donated back into the Gentoo project.

line
SevenL.net

Seven L Networks provides customizable Dedicated Servers for your customized Gentoo install. Colocation and other hosting services are also provided.

line
Tek Alchemy

Tek Alchemy offers dedicated servers and other hosting solutions running Gentoo Linux.

line
DDR Memory at Crucial.com

Purchase RAM from Crucial.com and a percentage of your sale will go towards further Gentoo Linux development.

line
Win4Lin at NeTraverse

Win4Lin from NeTraverse lets you run Windows applications under Gentoo Linux at native speeds.

line
Copyright 2001-2004 Gentoo Technologies, Inc. Questions, Comments, Corrections? Email www@gentoo.org.