Reply: “Top 10 Problems with RPM”

A reply to Claudio Matsuoka’s clueless advocacy at

top-ten-problems-in-rpm


1 - Berkeley database backend. Dpkg does a much faster and at least equally reliable job with plaintext files, which don’t get corrupt as often, don’t need periodic reconstruction and are human-readable. The SQL database backend is a huge step in the wrong direction.

Blaming Berkeley DB for rpm performance problems is like blaming Iraq for Bush.

Berkeley DB is one of the highest performing, widely deployed, “best-of-breed” database implementations in the world. Period. Not only RPM uses Berkeley DB: perl, python, sendmail, subversion, and many many other projects have chosen Berkeley DB. All of those applications are not mis-guided, or they all would have chosen flat file databases, like apt and dpkg have chosen to do.

Claiming dpkg is faster without any attempt at measurement is akin to asking one to believe Rumsfield’s lies.

Flat files have their uses, but the goal in RPM of using Berkeley DB (or any DB) is performance, not otherwise. No flat file implementation will beat hashing and b-tree indices for the same content retrieval in any but trivial cases. Human readable is only necessary when humans need to edit files to “fix” something, which is perhaps at odds with the claim of uncorrupted reliability of dpkg and apt flat files.

Next time try adding –stats to any rpm command and study the numbers a bit. Try to come up with a comparable operation with dpkg/apt before baldly spreading FUD about Berkeley DB thickly and clumsily.

The sqlite3 database backend was added because Berkeley DB licensing prevents certain vendor(s)from distributing RPM, and because other cross-platform vendor(s) wished not to spend time implementing locking schemes on odd-ball embedded architectures. So the sqlite3 implementation is there to satisfy RPM customer requests. Perhaps satisfying customer requests is not as important a goal for dpkg and apt as it is for RPM.

I will happily use the highest performing implementation in rpm, whether that is flat files, Berkeley DB, sqlite3, whatever. At the moment, Berkeley DB is the highest performing database implementation I am aware of.

2 - Installation of new package before removal of the previous version. This adds unnecessary complexity and leads to a non-intuitive sequence in execution of pre- and post- install/uninstall scriptlets, and can create problems only solvable using triggers. Triggers shouldn’t exist, they only solve problems caused by design problems and policy flaws.

Installation before removal is absolutely necessary to insure that the window where shared libraries are not available is minimized. Removal of library symlinks before reinstalling will affect all programs that are started while the upgrade is in process. That is a requirement for upgrading software on live production machines.

Perhaps dpkg and apt do not run as often as RPM does in “production” environments, where the size of the window where a soname is unavailable determines whether upgrades can be attempted on live systems. Large upgrades with remove-before-install can/will cause programs executed during the upgrade to fail randomly, and lare installs will ibncrease the probability of random failures.

Confusing the use with the concept of triggers in order to argue that triggers should not exist is silly at best. In the real world of already released packages — some with flaws — triggers are necessary to retrofit fixes for packages that cannot otherwise be changed.

Sure Debian packaging policy is better at preventing packaging flaws that must be corrected by use of triggers. OTOH, RPM based distros are released far more often than Debian is released, and packaging mistakes are a fact of life for all distros.

Naive intuitions are often wrong. Catering to intuition rather than reality is, well, wrong.

3 - Network awareness features. It increases size and complexity of the binary, and tries to perform a task that should belong to an external utility such as Apt-get or Smart. Recent versions try to contact PGP keyservers and block execution when it fails.

There is nothing whatsoever forcing anyone to use rpm networking capabilities to download and/or install packages. Size and complexity are bogus metrics, socket programming has been around since the early 1980’s, and certainly isn’t hard or complex.

RPM has had a FTP client (one of the first in a linux tool, nice and small) since 1997. RPM was also one of the first HTTP 1.1 clients in linux in RHL 5.2. Users expect features in RPM to be maintained.

All distro vendors were warned that rpm-4.4.1 was going to have a default key retrieval enabled. A posting describing how to disable was sent to <rpm-devel@lists.dulug.duke.edu>. The goal is/was to increase security by always checking signed package contents, that can only be done (imho) by automating pubkey retrieval (or otherwise installing keys beforehand). Stronger package integrity checks increase the reliability of package management.

So the pubkey retrieval from key server facility is C-O-N-F-I-G-U-R-A-B-L-E, and not attempted if/when pubkeys are properly imported locally.

4 - Obtuse macro expansion and comment handling in specfiles. Macros expand inside comments, and line breaks cause unexpected behavior.

All macro languages exhibit context peculier behavior to some degree. For one example, m4 macros expand where found, and require dnl as a m4 specific comment lead in. The expansion of $Foo$ rcs/cvs strings is another example that can/will cause damage to the file that contains the $Foo$ token, depending on what multiline construction is needed for the file. Is the difference between use of “#” and “dnl” comments in m4 any different than the need to add a 2nd ‘%’ character to prevent rpm macro expansion in comments?

Furthermore, the simple rule

Macros expand everywhere they are found.

is a brutally efficient solution to the lack of a well defined enumeration of all possible contexts in which macros can expand. Without identifying all expansion contexts, other rules are doomed to fail someone’s expectations.

5 - Absence of logical OR in requirements forces the developer to always regenerate all alternative packages to provide virtual packages.

I see no basis for the claim that developers are forced to regenerate all alternative packages. A matching Provides: to satisfy a Requires: can be added to any virtual package, and additional Requires: (or other dependencies) can be added to the virtual package to insure that the semantic intent is preserved.

RPM has alternation of Provides: rather than Requires:. This is very well known, and has been thoroughly discussed on (at least) the LSB packaging mailing list several years ago.

Without a concrete and specific example of a flaw, I cannot respond further.

6 - Incomplete timestamp format in specfile changelogs. Standard date(1) format or other providing time of change is needed.

Yes, quite annoying. However, a fix introduces instant legacy incompatibility in any spec file that attempts to use the fix, and that problem is insoluble without forcing users to upgrade their version of rpm.

The timestamp syntax has been consistently broken since 1997. Is it *really* that hard to edit date(1) output in a spec file %changelog?

7 - File dependencies are treated in a special way and are not regular virtual packages (a better design would make packages relate only to other packages, real or virtual). They increase complexity of dependency resolution and promote sloppy pratices in software packaging.

RPM computes a “contains” relationship to map all — package, soname, file, foo(bar), … — dependency tokens to the package that resolves the dependency. The cost of the mapping is quite modest using any standard technique (rpm uses bsearch because of portability) to associate a key with a value. Add –stats to any rpm command to measure the overhead for your specific benchmark.

The only “special way” that file dependencies are different than, say, package name dependencies, is that dependency tokens that start with ‘/’ are resolved using file paths rather than provided dependency arrays. The file paths also have a different index than other provided tokens that needs to be addressed by depsolvers using rpmlib.

The additional complexity introduced by the mapping from dependency token to package has immediate benefits in permitting simpler automation for extracting dependencies, and more precise specification of the dependency context, e.g. a dependency is needed to, say, run a script interpreter during install.

If anything, package managers need more, not fewer, contextual hints like ‘/’ (and the quite similar foo(bar) namespace markers) in order to more precisely identify the resolution context, and (incidentally) to assist packagers and applications with hints why the dependency is necessary. Increasingly, probe dependencies attached to syntactical sugar like “rpmlib(foo)” are going to be needed to detect intrinsically run-time conditions like “This system has booted with selinux enabled and nptl disabled.”

OTOH, the mapping of dependency tokens — including file paths — could be done when the package was built in order to only include package name dependencies in the package being built, and the world of rpm packaging would not change a bit. There has been no reason to attempt build time mapping, perhaps I shall add to rpm so that I do not have to listen to the droning mechanical noises of the Debian Borg regarding the use of file dependencies in RPM ever again.

I see no basis for your claim that file dependencies lead to “sloppy … packaging”. In fact, automating dependency extraction with rule based scripting using non-package-name dependency tokens leads to better, not sloppier, packaging, as human packagers often make mistakes that are harder to detect and correct than a faulty rule based script.

8 - Problematic handling of simple situations such as replacing directories with symlinks. Bad habit of stating all mounted filesystems prior to installation (at least in earlier versions).

Last I checked tar/cpio have exactly the same problem as rpm unpacking a symlink path onto a directory. Perhaps current versions of tar/cpio have “fixed” the behavior.

FWIW, rpm-4.0 in RHL 7.0 had a means to replace directories with a symlink. The implemention was scrapped because of a clunky, hardwired syntax for the functionality that did not permit testing exceptional conditions, like ENOSPC or EIO on a system call. Recent rpm has embedded lua, and a pre-everything scriptlet, which permits symlinks to replace directories in *.rpm packaging, but perhaps the implementation is not sufficiently transparent for your tastes.


9 - Non-intuitive (or plain broken) algorithm to compare package versions (example: 1a > 1B) . Epoch zero is considered newer than no epoch at all.


The rpmvercmp algorithm is not broken because the comparison is well defined, has the
properties needed for package management, and is widely and usefully deployed.

Whether the algorithm is intuitive is entirely in the eye of the beholder.

The “1a > 1B” comparison is due to strcmp(3) from glibc. Blaming rpm for glibc behavior is like blaming the atomic bomb for the cold war. Certainly internationalization has led to some surprising and unexpected behavior from all unix utilities.

Comparisons between 0 and undefined (because missing) epoch need to be defined somehow. The number of packages that choose to add Epoch: 0 explicitly used to be vanishingly small until Fedora pedants decided to change the world by demanding (wrongly!) that Epoch: 0 be added to all packages.

There are other well known non-intuitive flaws in rpmvercmp like

a) 1.0000000000000000002 > 1.1 (because 2 > 1 in the segmented digit string compare)
b) 1_1 = 1.1 (because all non-alphabetic, non-numeric, characters are treated equivalently)

and there are further problems with mixed mode comparisons between alpha and digit strings,
that are perhaps more likely to be encountered than the examples that you give.

The rpmvercmp algorithm is not ever going to change, the release engineering necessary to deploy a change to version comparison in RPM is cost prohibitive because of the gazillions of *.rpm packages that have been built successfully over the years.

But yes, the rpmvercmp comparison is quite unpleasant from an intellectual POV, and just happens to “work” sufficiently well for package management purposes.

10 - No provisions for interactive configuration scripts or human inspection/approval of new configuration files, and no concept of post-transaction configuration.

Configuration management is unrelated to package management. Batch mode unattended installs
were a primary design goal of rpm that has proven successful and popular. You might as well
fault emacs for not behaving like vi.

WordPress database error: [Unknown column 'comment_count' in 'field list']
SELECT comment_count FROM wp_posts WHERE ID = '22'

No Responses to “Reply: “Top 10 Problems with RPM””

  1. temporary title » Blog Archive » Jeff’s reply to RPM problems Says:

    […] If you read my previous “Top Ten Problems in RPM” post you should also check the maintainer’s take on it. Some interesting explanations that are worth your attention. […]

  2. Vault » Blog Archive » Desktop Linux Debate Says:

    […] sign decisions. A technically solid (if less then totally coherent) rebuttal can be found here. The whole DKPG/RPM debate is one of the quickest ways to spot a Linux zealot. Technically speaking RPM i […]

Leave a Reply

You must be logged in to post a comment.