Free Source Project Management
Posted 4 Nov 2000 at 22:01 UTC by rlk 
Project management and engineering is a largely neglected aspect of free
source development. Monty R. Manley addressed this issue in a recent
article on linuxprogramming.com, but I have reservations about his
particular recommendations. Nevertheless, it's an issue we need to
tackle.
Introduction
Project management and engineering is a largely neglected aspect of
free source development. Monty R. Manley addressed this issue in his
article Managing
Projects the Open Source Way. He proposes a much more formal
style of software development for free source (my term for the union
of free software and open source) projects, along the lines of
traditional commercial development. I absolutely agree about the need
for better engineering practices. What I think we need to do, though,
is recognize how free source operates, extract the best practices from
it, and from that try to find engineering practices that mesh well
with the culture.
Manley's article is very thought-provoking, and prodded me to give
some thought to issues that have been at the back of my mind for a
while. As a free source project lead myself (for gimp-print), I've had to
face a lot of these issues. In my professional career, I've
frequently been both a developer and release engineer, and I've
developed some insights from this experience.
What I'd like to do is explore some of the issues Manley raises,
analyze how they apply to the free source community, and come up with
some suggestions, or at least points for future work.
The Waterfall Model and Formal Methodology
For many years, the accepted methodology for software development was
what's often called the "waterfall" model: starting from a carefully
done requirements analysis, the team proceeds to do a functional
specification (architecture), then a high level design, then a
detailed design, and only then does coding commence. After coding
comes testing, and only then is the software released. This seems
logical; one shouldn't start building something until one understands
what's being constructed. By constructing software in this
disciplined fashion, the engineer knows exactly what needs to be done
at each step along the way. Many people have suggested methodologies
loosely based around this model; there are various commercial tools
available that supposedly make these various steps more mechanical and
therefore less likely to go wrong.
As I understand Manley's thesis, it is a weakness of free source
development that the early steps (requirements and design analysis)
are largely neglected. I disagree with this on two counts:
- The waterfall model has serious weaknesses.
- Free source development does in fact perform requirements
analysis, just in a different manner.
Let's start with the weaknesses in the waterfall model itself.
Manley
asks
How can you write a program if you don't know what it
is supposed to do?
That's a good question. How can one build anything without knowing
what it's supposed to do? Nobody in their right mind would dream of
building a bridge without knowing what it connects, the traffic load
it is to bear, the underlying geology, and so forth. Why should
software be any different?
In many cases it shouldn't be. The software team supporting the
Space Shuttle is famous (as documented by Richard Feynman) for
delivering bug-free software on time without working 80-hour weeks.
Programmers writing the embedded code to control a medical monitor
have to deliver under equally rigorous conditions, and do so
routinely. In such cases, well-disciplined teams do follow this kind
of model: they are given a set of requirements, and carefully go
through repeated design iterations and reviews before committing a
single line of code to disk. The coding is in many cases a largely
mechanical process, and bugs in even early builds are a serious
concern, because they indicate that something went wrong early on.
But is this model really efficient, or even appropriate for free
source software, or even for most commercial (as opposed to dedicated)
software? I believe that it's neither efficient or appropriate much
of the time.
The examples I gave are examples of mission-critical applications,
where the software is just one component of a larger deliverable on
which people's lives depend. It is a slow, painstaking process that
trades off innovation for reliability. I hope that if I ever need to
be hooked up to a medical monitor that the programmer emphasized
reliability and correctness over the latest whiz-bang graphics. Most
software that most people interact with directly is not
mission-critical in that way; it only needs to be good enough to get
the job done effectively.
That isn't to say that I agree with the tradeoffs that, say,
Microsoft makes; they have taken this to extremes even to the low
levels of the operating system, so that the base is not robust. The
point, though, is that most computer users benefit from having
additional functionality at the expense of ultimate perfection; the
perfect is the enemy of the good, and the hard part is deciding what
is "good enough". It simply isn't necessary to do a perfect
architecture in many cases, for example. On the other hand, not
giving it enough attention means problems down the road. The more
other software will rely on this package, the more essential solidity
is.
The Weakness of Traditional Requirements Analysis
The other part of the problem is that the user base often
doesn't know what the program is supposed to do. The space
shuttle mission specialists do know exactly what they need, and
they've learned over the years how to express it. Users of a word
processor, for example, know that they want to edit and format text,
but only at a general level. It simply isn't possible to gather
comprehensive requirements before starting work. It's easy for a user
to say "Oh, I really don't like having to indent each paragraph" if
the editor doesn't do that automatically; it may be hard for a user to
think of that in a vacuum. A user not suitably trained may not know
how to express a requirement that he or she actually does understand.
In this situation, the familiar GIGO (garbage in, garbage out)
principle comes into play: if the initial requirements are useless,
then any functional spec written from those requirements is equally
useless. Following the waterfall model may yield a perfect white
elephant.
What I've sometimes observed is that in order to follow the rules,
a programmer will indeed write a functional spec and design, but only
after writing and debugging the code to the point of doing something
useful. Sometimes this is winked at; sometimes it's blamed for delays
and other problems, and sometimes management has no clue what's going
on. Sometimes, in my view, it's a perfectly rational response to a
perfectly rational requirement.
Why is that? Well, let's get back to the issue of incomplete or
incorrect requirements. The fact that the requirements are faulty is
only detected when the prototype (which is what it really is) runs
well enough so that the prospective user actually sees that it isn't
doing the right thing. As long as the requirements can be corrected
at low enough cost, this iterative process can work.
The demand for requirements, architecture, and design documentation
on the part of management is actually rational, even if the way it's
handled often isn't. Even after the fact design documentation tells
the next generation of programmers what's going on, and usually a lot
more effectively than documentation written ahead of time that isn't
updated to reflect reality. The requirements document at least
demonstrates what problem the software ultimately purports to solve.
In order for requirements to be corrected cheaply, the programmer
has to be close to the end user. That's a lot easier in the free
source world than in the commercial world. The free source programmer
doesn't feel compelled to hide what she's doing for fear that her
competitors will steal a march on her, or to withhold functionality
now so that it can be sold as an upgrade later.
However, for this to work well, it must be easy for the end user to
find and try new packages. There have been steps taken in this
direction; the GNU configure system provides a common way to build
packages, for example. However, it's still very difficult and
time-consuming for someone to download and build a package to try it
out, but then discard it (and restore the system to its prior state,
with no loss of data or system stability) if it doesn't work out.
It's also often not obvious how to give feedback. None of the
existing packaging systems really support this.
Creating, Copying, and Cloning
It's often said that free source is a lot better at copying and
extending than innovating from scratch. I'm not entirely convinced
this is true; some of the most innovative software is the product of
an open development environment: the ARPAnet, UNIX (yes, UNIX started
in a largely free environment), the Lisp Machine, and the very notion
of a windowing system (at PARC). Furthermore, there really isn't all
that much innovation from scratch, particularly in the software world;
most projects build on existing work, in both commercial and free
software. I agree that it's probably harder to start from scratch on
an initial, very complex goal with no prior example and without the
funding required to support a team of developers working full time for
an extended period, but such projects are few and far between
anywhere. The examples I gave of truly innovative free projects all
received substantial outside funding.
But is this really a strike against free source development? I
think not. It's just as satisfying for the user in the end to have a
better way of doing something familiar as it is to have something
truly new and innovative. Linux is a derivative of UNIX, but with a
new implementation and the chance to do things better (in many ways,
it's much lighter in weight than commercial UNIX products). The GIMP
started out life as another image editor pretending at Adobe
Photoshop; it has evolved in its own direction and in may ways is more
powerful than Photoshop.
One of the major goals of object-oriented programming, in fact, is
to permit the creation of components that can be used as building
blocks to produce something greater. What a lot of people fail to
recognize is that the groundwork for this was laid by the likes of
Aho, Weinberger, and Kernighan (yes, the authors of awk), and the
other tool builders of the early UNIX days. Using a simple shared
data representation (whitespace-delimited fields in lines of ASCII
text), they built up a set of tools that could be glued together (with
shell scripts) to produce much more powerful tools. Of course, that's
not true object oriented programming, but it embodies much of the
spirit of reuse.
This kind of flexible tool building and reuse is a characteristic
of software that distinguishes it from other engineering endeavors.
The capital cost of tooling up to produce a new hardware widget, even
a minor variation on an existing one, is very high. Machinery must be
rebuilt, new dies must be cast, new containers designed, and so forth.
A new production run must be started. If the new widget turns out to
be defective or incorrect, expensive materials must be scrapped, and
precious production time is lost. In software, in contrast, the
production cost for a unit item is essentially zero; the design and
implementation cost is the only cost. This encourages free
experimentation and creative use of existing components.
In any event, the free sharing of code that characterizes the very
core of free source is a tremendous strength of this form of software
development. Indeed, if there's a weakness, it's that there is so
much out there that nobody can keep track of it, and finding it is a
challenge. This suggests an urgent need to catalog and index the
variety of free source out there, so that people who might like to use
it can find it more easily.
This also suggests why patents and overly-strict interpretations of
copyright are potentially so devastating to software, because they
inhibit this free interchange of ideas and methods. Patents are
intended to encourage innovation by granting the original creator a
limited monopoly on use of the innovation. But software does not need
grand innovations so much as wider and more clever use of existing
methods, and so patents actually serve to inhibit the kind of
innovation that software needs.
Release Early, Release Often, Release Never?
Manley also takes issue with the "release early and often" concept.
He specifically argues that pre-alpha (defined as feature- or
API-incomplete) software should generally never live outside of the
development team. He also claims that "release early and often"
stands formal release methodology on its head. I disagree with both
these points -- not only do I believe that development software should
be available broadly (and I'll give some concrete examples of why
shortly), but also, if done correctly, it is not at all at odds with
good release methodology, which I endorse.
First of all, let's understand what "release early and often" is
intended to accomplish. The goal of this methodology is to allow
prospective users the chance to experiment with something and report
problems, or request new features, or add new features themselves and
contribute them back. However, simply releasing early and often
doesn't guarantee that this will happen; if the project quality as a
whole is poor, or it doesn't do anything that anyone cares about, this
will not happen. If the releases are so frequent that users get
frustrated, and they don't offer enough, they will also fail in their
purpose.
Large companies frequently do internal releases periodically
(anywhere
from nightly to monthly), and encourage employees who wish to be on
the cutting edge to try these builds. This is a form of "release
early and often", even if the audience is smaller. So this really
isn't something unique to free source, although fully public release
is.
Let's take another look at the goal of release early and often: to
allow the user to experiment with something new. I discussed above
an iterative model of requirements analysis: give the user a prototype
and refine it based on feedback. Doesn't that sound familiar? By
releasing frequently, we give the user an opportunity to offer
feedback.
To be effective, though, this must be done well. Simply cutting a
new
release every night and telling everyone to upgrade isn't going to
work; users have no idea what to expect and will spend all their time
upgrading, quickly growing tired of the exercise. Users who want to
do that should use the development repository. To be useful, a
release must:
- Contain sufficient new functionality or bug fixes to be
worth the effort.
- Be spaced sufficiently far apart to allow the user time to work
with the latest release.
- Be sufficiently functional so that the user can get work done
(quality).
This doesn't mean that each release must be a fully polished product,
but it does mean that it must have the characteristics of a good
product: it must be coherent, worth the upgrade effort, and clean.
Note that I didn't say complete -- we already understand why it's hard
to create a complete product without understanding what the user
needs.
There's another benefit to all this: it forces developers to
continuously test their work, since it's never too far from
visibility. None of this is incompatible with a good release
methodology, which addresses all of these things.
Keeping a project too close to the development team -- not allowing
outsiders to use it until very late in the game -- means cutting off
the ability to find evolving requirements. Here are two sets of
contrasting examples, involving very similar projects, one released
publicly and the other one not:
Emacs
Emacs (the text editor) -- there are currently two versions derived
from the GNU Emacs base, the continuing GNU Emacs and XEmacs, which
split off in the early 1990's. GNU Emacs is developed by the Free
Software Foundation, and XEmacs is developed by a broader team. The
split occurred for reasons involving code copyright, but the
development methodologies have been very different.
GNU Emacs is developed by a small team within the Free Software
Foundation. Currently, version 21 has been in development for several
years; version 20.7 is the current release.
XEmacs is developed in a public manner, with a stable and a
development branch, and a public web
site. The current stable branch is 21.1.11, and the curent
development branch is 21.2.35.
XEmacs has many visible differences (including embedded images and
proportional fonts), but the internal architecture is now very
different; it's much more refined, with things such as characters,
events, and keymaps being first class objects. Source: http://www.xemacs.org/About/XEmacsVsGNUemacs.html.
Note that this is somewhat out of date, but many of the differences
persist.
GCC
GCC (the C compiler) -- the Free Software Foundation developed GCC in
a similar fashion, with only targeted releases being made available
externally. Development stalled out around 1996, and with no
visibility into the project, nothing happened. A few years later, a
team based at Cygnus Solutions took the existing code, added some
outstanding patches, and reworked some things, and released EGCS
(Experimental GNU Compiler Suite). This was developed in an open
fashion, and made rapid progress. Eventually this split was healed,
by the FSF transferring stewardship of GCC to the EGCS team.
In summary, the advantages of the public development model are:
- Better accountability -- people outside of the project can see
into it, and the development team feels more pressure to keep things
moving along.
- More early testing.
- More attraction for prospective developers, and so a broader
developer base.
Manley is quite correct, though, to note the distinction between
development (or pre-alpha), alpha, beta, release candidate, and
release. Let's take a closer look at what the user base typically is:
- Development releases are used by people who enjoy living on the
edge, or who are going to form the core user base of the new feature
and who really need to see what's going to be happening to offer early
feedback. These people might not be interested in contributing code
(developing), but sometimes these users do become developers. People
who use development releases have a responsibility to understand the
hazards. Developers have a responsibility to make these hazards
clear.
- Alpha releases are for people whose needs are less urgent, but
who want to see something that more or less looks like the final
product. Early adopters should experiment with alpha releases, and be
prepared for problems, but the development team needs to start
exercising more discipline.
- Beta releases are for mainstream/early adopters, who reasonably
expect good functionality and reasonable polish, and who want to help
test the ultimate product. The development team should exercise
strong discipline at this point.
- Release candidates should be very close to the final release.
The development team should stand behind release candidates as though
they are final releases.
- Final releases should be a product that the project team is
comfortable with anyone in the target audience using.
As long as it's clearly understood by all parties -- developers and
users -- what's expected at each step, there shouldn't be any
problems. The hard part -- and this seems to be as difficult for
commercial developers as for free source developers -- seems to be
exercising the appropriate discipline at each step. Usually the
problem is exercising too little discipline from alpha forward, but
exercising too much discipline too early runs the risk of the project
growing tedious and not maintaining forward progress. The release
engineer needs to understand the process. This is one place where
there's no substitute for an iron fist.
In my experience, it's usually around the transition from alpha to
beta that projects, both commercial and free, start to lose their way,
and beta is often handled poorly. Beta is usually entered too early
(with the project not complete enough), and the project is not willing
to do enough beta releases (and thereby spend enough time) to ensure a
clean product. Beta really should be feature complete. If testing
reveals too many problems (either deep bugs or clearly missed
requirements), the beta should be withdrawn, and the project should be
slipped appropriately. If that's not acceptable, it should be
understood that the release will be flawed
An interesting development is the rise of Linux distributions. The
organizations producing these distributions are essentially system
integrators, and the good ones take an active role in monitoring
development and integrating packages cleanly into their
distributions. This is an interesting model, and it may have
implications for free software development. The distributions could
be a very useful source of feedback to the development projects, and
if distributions were to arrive at a common set of standards and
practices for developers to follow, and publish close dates for
integration into their individual distributions, it would help guide
developers. Perhaps commercial Linux distributions, which have
revenue, could perform as a service some of the less pleasant tasks
that a lot of free source projects don't tend to do internally.
Maintenance
The first major release is always easy. There are no pre-existing
expectations; the initial code base was small and easy to work with,
and everyone'e excited to have their project out there for the first
time. The second one is hard. People are burned out; figuring out
where to go next is harder (all of the obvious good things were done
the first time); the code base is more complex; people feel cocky from
having done it once; and the team doesn't have the experience yet to
know what happens next.
This is not unique to free source projects. I've seen exactly the
same thing happen in closed source projects. I've been involved twice
with major new software products, and both times I've seen a good
first release and confusion in the second. This has been documented,
in the context of business rather than technology, in Crossing the
Chasm by Geoffrey A. Moore and Regis McKenna. While I'm not
convinced that the book has all the answers, it does at least document
the problem. About the only solution is perseverence and
organization. I'm facing this right now with gimp-print; our first
release (4.0) is very successful thus far, but figuring out what comes
next is much harder.
Where Do We Go From Here?
In the spirit of stimulating further thought, here are some some
concluding thoughts and recommendations.
- The pure waterfall model seldom works well in commercial
projects, and it's likely to be even harder to apply in free
projects. However, there are useful lessons to be drawn, and even
performing some of the steps out of order carries benefits.
- "Release early and often" performs a lot of the functions for
free source that regular internal releases do for closed source. If
handled correctly, it is of great benefit to the project, and there
are actually representative case histories that demonstrate this.
Project leads should emphasize frequent convergence to allow clean
(not necessarily complete, and not necessarily bug-free, but usable)
frequent releases. Rather than being poor practice, it actually
forces good engineering discipline.
- The free software model has certain unique strengths, such as
the freedom to share code, that are very much in accord with
contemporary development practices. However, in order to share code
effectively, it must be of a certain quality and functionality.
There's also so much code out there that nobody knows where it is. If
we can devise a system to index all of the code, so that people can
take more components off the shelf and use them with relatively little
modification, we can leverage this to great effect. While it's often
said that free source is weak at de novo innovation, perhaps the real
answer is that free source is particularly strong at synthesis, since
there are no strategic business reasons for avoiding use of somebody
else's code.
- Good release engineering is good release engineering, free
source or otherwise. A lot of that is just good self discipline.
Engineering is 90% common sense applied to 10% specialized knowledge.
- The Linux distributions and other free source-related vendors
could offer more services to the free source community. While this
would carry costs for the vendors, it would also benefit them by
improving the overall quality and functionality level of free source
software.
- Getting good feedback from users is hard. What's the best way
to do it? A web-based form, a mailing list, or a feedback tool built
in to the application? If the latter, can we come up with a common
mechanism for that purpose?
- It's always easier doing the first release than the second.
The first release is very exciting, and usually has the biggest jump
in functionality. How do we get past this barrier?
- Free source (particularly free software) developers are usually
volunteers. How do we motivate them (ourselves, really) without
pushing too hard? What kinds of organizational structures work best?
- What developer tools do people really need, and how can we
minimize the spinup time? SourceForge is trying to address this, but
it's not perfect. Analyzing what works and what doesn't work with
SourceForge could go a long way toward improving it.
- Perhaps we need some kind of free source engineering summit,
like the printing and database summits?
Hi all,
Interesting article (i.e. I agree with lots of it :-P)
I'm the GNU Parted
maintainer (wrote
~70% of the
code). Here's some reflections:
- Parted is a fairly small program (~25000 lines, ATM), with me being
the permanent only active hacker, but lots of hackers being sporadically
interested, and making useful contributions. (Eg: ext2 support, PC98
support, help with Mac support, user interface issues, etc.)
- Quality control is very important for Parted, because people's data
is at stake! Therefore, Parted has regression tests, and assertions
compiled into MAINSTREAM releases, etc.
- Parted is certainly "release early, release often". Like Linux, we
fork stable and development versions (porting fixes, etc. between
versions), and before converting a development version to a stable
version, go through a long 1.X.0-preY phase. This preY phase usually
lasts ~1-2 months. Like rlk said, important missing features or design
flaws are often found at this point. I think these issues (as opposed
to mere bugs) should be dealt with at this point, rather than keeping to
a strict "discipline" of applying bug fixes only. I have always dealt
with them, because it seems Wrong to have bad code going out to The
Masses (in a "stable" release).
In fact, I sometimes fix design flaws in STABLE versions - particularly
if the design flaw makes the code difficult to understand/debug. This
might involve rewritting 100s of lines of code. Some people might think
this is insane - but I think it leads to better reliability (by the next
stable revision - I still go through a short preY phase of ~ 1 week) in
the next version.
So, my point here is: a "release engineer" should think about the impact
the changes will have on the users - not simply follow "discipline".
- Parted gets quite a bit of user feed back :-) I put encouraging
messages in the documentation:
Feel free to ask for help on this list - just check that your question
isn't
answered here first. If you don't understand the documentation, please
tell
us, so we can explain it better. General philosophy is: if you need to
ask for
help, then something needs to be fixed so you (and others) don't need to
ask
for help.
Also, as I said earlier, Parted has lots of assertions. If one
fails, an
error message comes up:
You found a bug in GNU Parted. Please email a bug report to
bug-parted@gnu.org
containing the version (<VERSION>), and the following message:
<ASSERTION-DESCRPTION>
I've probably received ~50 bug reports this way (just did "grep | wc"
on mail)
- Parted, being a small project, hasn't really needed any special
tools for this kind of thing. OTOH, it could probably benefit from
something like Aegis
(which is GPL, BTW).
- "Motivating" people isn't really an issue, is it? If some people in
a project want good quality control, etc., then these issues will be
discussed, and dealt with (if not, then there are more fundamental
problems...?)
Andrew Clausen
Counterpoint, posted 5 Nov 2000 at 17:43 UTC by mrorganic »
(Journeyer)
As the author of the original article, I felt I should post a response
to this article. However, my philosophy is encapsulated in my
article, so I will not restate it here; rather, I want to address the idea
OSS developers -- particularly Linux developers -- are somehow
exempt from the pitfalls and problems encountered by other
development models. This simply isn't true -- in fact, the OSS
development model presents more problems yet due to the highly
distributed nature of the development teams.
I also vigorously dispute the fact that the "traditional" method does
not work. When the formal method is followed, it works quite well.
It's just that the software industry as a whole doesn't do a good job
of following the model.
My concern is that the Linux development teams tend to disparage
the formal model because it is, in their minds, a "closed source"
way of doing things and is therefore bad. Nothing could be further
from the truth: good engineering practices remain good, whatever
the philosophy of the developers. ESR's paper "The Cathedral
and the Bazaar" is often used as a defense of this method, but it's
worth nothing that even ESR took Linus to the virtual woodshed for
using his "inbox as a patch-queue".
As for the objection to the tradiional method of gathering
requirements, I can only say that if you can't gather good
requirements, you're not asking the right questions. You cannot --
absolutely -- write good software without knowing to a pretty
detailed degree what the software is supposed to do and how it is
supposed to do it.
In my mind, this is why so much software on Unix tends to be
derivative rather than revolutionary. Unix programmers are
excellent implementors, but tend to have trouble innovating -- I
believe this is due to a lack of skills in the requirements and
design fields. You can't innovate if you don't know how.
And the only way to get good at it is to start doing it.
Methodologies, posted 5 Nov 2000 at 19:12 UTC by nymia »
(Master)
Here are some additional links for methodologies used in project management:
1. COCOMO [1
]
2. Fagan [1
, 2]
3. UML [1 ,
2
]
4. Yourdon [1]
5. Meyer [1, 2]
6. Etc [1]
I certainly don't believe that Linux and other free source development
is free of the problems faced by closed source development models,
although I can see why some of the things I said might be interpreted
that way. I do believe that free source development does have certain
advantages, and that we should understand what they are and learn how to
leverage them -- that's the central thesis of my article. I'm
certainly not a fan of the way Linus runs kernel development; I think
that without the likes of Alan Cox and Ted Ts'o, it would be in fairly
serious trouble by now.
However, the motivations behind (most) free and commercial development
are quite different. The goal of commercial development is to make as
much money as possible, which usually means reaching the broadest
audience. Most free source developers aren't looking at that end goal;
they either want to solve their own particular problem, or come up with
a "Better" way of doing something. In particular, reaching the widest
audience in the shortest time is probably about the closest thing to an
explicit non-goal that many free source projects have.
In particular, few if any free source projects start out as "I want to
do something that a lot of people will want to use; I know my basic
product space, and I'm going to do the market research needed to find
out what these people want". Linux didn't. Emacs didn't. Few free
source developers have the resources or free time to do this,
particularly early on, and I believe that rather than fighting that we
need to recognize it and learn how to most effectively utilize the
resources that people do have available.
Gimp-print, for example, simply started out as a need on my part to make
my Stylus Photo EX print under Linux. It only grew into something more
when I started seeing other people use. At that point, it was possible
to start gathering requirements, in the form of email that people
spontaneously sent me. I did put together a fairly coherent roadmap
(which is available on gimp-print.sourceforge.net; I won't reproduce it
here), and I was quite surprised when I went back to look at it that we
actually mostly accomplished those goals from January. If things had
headed off in a different direction I wouldn't have been at all
surprised.
Clearly, I don't believe that one size fits all; different projects
demand different techniques. If most free source projects are
derivative, because that's how most free source programmers tend to
think, then by golly let's learn how to leverage that to best advantage,
and come up with the specific engineering techniques that such
programmers will feel comfortable using that will assist in the
production of the best software they can do.
I need to read and ruminate on this in more depth before I can add
anything meaningful (if then, and if it hasn't already been added).
However, one minor quibble sprang out at me:
> free source (my term for the union of
> free software and open source)
I feel compelled to point out that 'free source' isn't a new
term;
I used it
two years ago in my free source developer survey, and probably others
used it before me. So it's not just your term,
rlk,
but one that already belongs to a community. :-)
The single biggest issue I see in all of these projects is "Version
2.0". Because there's no architecture, detailed design or requirements
analysis, development of new features is ad-hoc, and occasionally
detrimental to the overall conceptual integrity of the original design.
I've worked on a couple of projects over the years, including big ones
like XFree86. The most recent projects have been pnm2ppa and reiserfs,
which integrates into the Linux kernel.
For example, XFree86's XAA architecture, whilst zillions of times
better than what was before, simply tried to retrofit OO into a big C
program. When Metrolink provided us dynamic loading of modules, this
helped make us use less memory and promised platform independant driver
modules for any processor architecture. That never happened because
drivers still require too much information on the platform they came
from. When it came time for multihead, it's a major drama because there
was no forethought into the original server design. When it came time
to squeeze in direct 2D access for games like CivCTP or similar, there
were a few false starts before DRI came along. When it came time to
make 3D work, there were a few false starts, and now even as a
developer with one of the best 3D cards in the current marketplace, I
find it difficult to get 3D working. My mum would never make it happen.
All I'm getting at is that the larger a project sans architecture moves
on, the harder it gets to maintain and improve the project
during "2.0". This is true of pnm2ppa 1.0, reiserfs 4.0 is basically
going to be a rewrite from the ground up, and XFree86 today.
The conclusion I draw from the lack of serious up-front design in free
software projects is that they work better without it.
The key to getting work done in this way is to start with a minimal
feature set then gradually add to it, preferably maintaining a
functional system at all times. Note that having a small feature set
does not imply low quality software - it simply means a small
feature set. Working this way is much more fun, since you're coding all
the time, and very productive, since you get tremendous feedback from
having a running system.
This is not to say that you shouldn't design. Design is crucial, it
should be an integral part of the coding process. You should think about
what you're doing, why you're doing it, and whether it could be done
better every time you write a line of code, not just when you first
start out.
Any open source project which isn't fun to work on will fail. Anything
which makes it less fun to work on the codebase introduces serious risk
of technical implosion.
I've noticed a similiarity between working on Open Source code and maintaince coding in the commercial world. It really is an
exercise
in pychology - not programming. The more people have worked on the code, the harder it becomes. Hehe...guess you could say it
becomes Mob pychology :-)
Coders tend to impose some 'character' to a program almost automatically. It maybe top down, or bottom up...but it is consistent.
Some sort of naming convention may be used. Stub fuctions may even be provided because you _know_ someday your going to need to
go back and implement feature such-and-such. As you become more familiar with the code, you begin to pull information from it other
then simple program logic. The authors quirks, sense of humor, etc - all become appearent. This 'feel' enables you to make intuitative
guesses as to where problems lie, or what things to modify for a particular enhancement. Your able to tell _why_ something was done a
certain way, and what the authors were thinking when they created a constant PI with a value of 22 :-P
The code is released and modified by the Community, each member with his/her own unique style and idea of how things should be
done. In an ideal world, each contributor's work would match the 'feel' of the program. In the worst case, modifications are made, naming
conventions come and go, and structure begins to disintegrate. Along the way, the 'feel' has been lost. This isn't to say the program
doesn't work, merely that the 'hidden' information the code conveyed has been lost.
This isn't really something that can be regualted via style guidelines or standards. It's merely a reflection of different coders thinking
different ways. The key is to be aware of it, and to try to adapt your style to it. If the entire program uses buckets starting with "Yearly_"
then creating an "Annual_Salary" bucket is probably a bad idea. Remember, the program should look like a whole entity...not a
collection
of parts :-)
Maybe it's just my age (just over 33, not so old!), but many comments
written by Linux programmers strike me less as engineering arguments
than touchie-feelie religious babblings of the sort you hear on late-
night television. "It's how the programmer feels," they
say. "It's about the culture. We're different!"
You may enjoy the act of programming a great deal -- I know I do. You
may get a great deal of personal satisfaction out of it. You may
consider it an art form. But come end of the day, programming is an
engineering excercise, and is bound by engineering rules. You may not
like it, but that's how it is, and all the posturing and phrasemaking
won't change it.
They key robust computer software, as in most other hard sciences, is
rigor. Unfortunately, rigor seems to be the one thing many Linux
programmers fear -- to be rigorous means having to do all that
necessary but unfun stuff like documentation, debugging, and careful
design. It's not so much that every software program must be perfect;
it's that programmers who do not cultivate good development habits
early carry their bad habits with them into other projects.
The big trap for Linux developers is to fall prey to the idea that good
software will somehow magically "just happen" if that software is open-
sourced. Open-source can lead to good software, but not
inevitably so.
mrorganic -
You seem to be missing the point people keep trying to make. No one has
suggested that it wouldn't be beneficial, all other things being equal,
to have some sort of standard engineering practice (what that was,
people might disagree one). But what you are proposing is totally
unrealistic. Free software is mostly devloped by people for fun.
That's right, fun. And requirements documents aren't fun.
Sitting around talking about release engineering isn't fun. And
certainly spending years designing before you start coding isn't fun.
And if its not fun, people won't do it. Most people in the free
software community don't do this out of the desire to make world-class
software for its own sake. What you are suggesting is that volunteers
act like they were getting paid, even though they aren't. It's just not
going to happen.
More concretly, what about the example rlk gave of
gimp-print? He started it to fix a
problem he had.
Are you seriouly suggesting that he should have instead drawn up
lengthy and detailed requirements documents? If he had to do that to
start, would gimp-print exist today? I doubt it.
It may well be that your ideas are the best way to deliver software to a
spec on a deadline, when you are being paid. But as none of those
conditions apply, why are you trying to apply the techniques?
I'll sum up so we can all move on to other things:
Saying that Linux programmers will only work if the work is fun pretty
much guarantees that most of the software on Linux will be crap. Not
all of it: many engineers are careful, talented, and rigorous enough to
overcome the problems inherent in OSS development. But programmers who
don't want to commit to doing the hard stuff because it isn't fun are
going to keep producing derivative, badly-thought-out, and buggy
software until the heat-death of the universe.
This is bad because an entire generation of programmers are learning
horrible habits. Sure, they may be working for fun now, but
eventually they will probably want to get paid for what they do, and
what happens then? They've never learned how to do rigorous software
engineering, and will have a tough time unlearning all the bad habits
they picked up early on.
To me, the issue isn't "fun" (although I do mostly enjoy writing
software). It's craftsmanship. It's taking pride in doing
something right, even if that something is relatively small or
trivial. It's not just about "liking" programming, but loving it
enough to do it as well as you can every single time. Just because a
piece of software isn't mission-critical doesn't mean it shouldn't be
as well done as I can do it. Ultimately my software speaks of
me and the value I place on what I do. If I don't care much
about it and treat it as a lark, then my work will reflect that.
It's like the advice your parents gave you: if you're not going to do
it right, don't do it at all.
Res ipsa loquitur.
The latest IEEE Spectrum had a report about space station code. According to
Spectrum, the NASA software methodologies have turned out questionable code for the space station. Due to aggressive code freezes
and the like, there are hundreds of "SPNs" (station program notes) telling the crew how to work around bugs in the control computer
system.
This isn't to say they should host the project on Sourceforge. :-) Just evidence suggesting that perhaps the conventional software
development methodology isn't so sewn up and impregnable it no longer is possible to propose improvements, even when used where it
is
most at home.
mrorganic,
I think you're entirely missing my point, and that of
samth. It's no more true that Linux programming
must all be fun than it is that all programming must
be done according to strict methodology. However, there are a lot of
"informal" programmers in the free source space (with apologies to
RoUS, I hadn't heard the term used before, but I'm not
surprised that it was), and telling them that they must either use
formal methodology or give up programming altogether means that a lot of
useful stuff simply won't get written at all.
For what it's worth, I've been a professional software engineer for
about 15 years (2 years as an undergrad at Project Athena, which was
unquestionably professional level work, and 13 years since). I've seen
good engineering and bad engineering. I'm generally a stickler for
quality myself; I'm rather annoyed that gimp-print went out with a
couple of nasty bugs which didn't get fixed until 4.0.2 (at least one of
these should have been caught; the other one should have also, but it's
a bit closer). I've been a release engineer plenty of times (on
projects up to maybe 500 Kloc, about 15x the size of gimp-print), and I
generally espouse the 2x4 method of release engineering (read: I'm a big
guy, carrying a 3' long piece of 2x4, and I walk into your office when a
bug shows up late in the process, you're really motivated to fix it and
more importantly not have me walk into your office again). I can't
claim to have used a lot of formal engineering methodologies
(the kinds of things that Ed Yourdon and friends like to write about),
but I have done my share of formal architecture and design work. For
big, complex projects, this stuff really is necessary. However, save
for the flagship projects, in the free source world it's overkill.
There are examples of well-engineered free source projects, such as
KDE. There are others that IMHO need a bit more control. However, for
the vast majority of stuff on freshmeat (for example), it isn't. As a
project grows larger, more design work is likely to be necessary. One
could look at it as a lot of wasted work has been done, and the earlier
junk has largely poisoned the source; one could also look at it as a
potential learning experience. Certainly 4.1 is going to be somewhat
painful for gimp-print because we didn't anticipate what we would need
down the road. On the other hand, I think if we tried to anticipate
what we need (real CMYK, color management, and such), it would never
have gotten off the ground. We've learned a lot from our "prototype",
as it were, and in the process a lot of people have access to useful
software.
The point that I've been trying to make that just about everyone seems
to have missed is that those of us who do understand good
engineering principles try to come up with a streamlined approach that's
easier for inexperienced programmers to apply that will assist them in
creating better software. Maybe that's pie in the sky, but I believe
that there are some relatively simple principles that people could apply
that would help them create better software. That's what I'd like to
get out of this discussion.
If anyone stumbles across this article, I thought I would place a
pointer to a HOWTO I wrote on the subject. As a HOWTO, it is geared
toward a bit of a different audience and has a very different tone,
thrust and content. I actually used this article as a source while
writing the HOWTO
(and I mention it in the bibliography which I'd also love feedback on if
anyone has other recommendations for things to include in it).
The HOWTO is hosted by the LDP as the Software-Proj-Mmgt-HOWTO (I think) or
it's available from a project homepage
that I've put up for the project. I hope this is helpful to some.