Case study: Midgard framework in action
Nov 25th 2002 -- Martin Langhoff from CWA New Media has posted the following case study on using the Midgard Content Management Framework in a large-scale project.
A client of CWA New Media has recently made live a site we have developed using Midgard as the underlying framework. The project overall involved 3 companies, responsible for the back-end, front-end and hosting. This three teams, plus a sizable team put together by the client, worked for over a year from prototype to launch date.
The site is a high profile government portal with an extensive search system that is driven by an extension of the Dublin Core metadata standard, called NZGLS. Additional search services is provided by the Autonomy search engine. A custom protocol, extremely similar to XML-RPC, glues front-end with back-end.
Our team was in charge of developing the front end, which included some CMS functionality. We decided to use Midgard as a framework, and we also provided a simplified CMS inteface based on 'OldAdmin'. The team involved a project manager, one art director/designer, one designer, one html developer, one architect/programmer and four programmers. Not
everybody was working on it all the time, but for a good 5 months we had a core team of 6 working full time on it.
After the initial pilot stage, where we did not use Midgard at all, a lot of
time was spent in setting up infrastructure to support the project. The
core issues addressed were:
The hosting is handled by a different company. This meant
we had to provide install scripts and instructions that worked perfectly to
ensure smooth deployment. If deployment was problematic, it would be hard to sort out responsibilities with the hosting company.
A makefile was put together that would compile apache/php/midgard from source, configure them, load
our application, and configure it as well. It was later enhanced to
handle upgrades while preserving user data.
Our development servers were Debian 2 (Potato), and target servers were RedHat 7.2. As a result, the install and upgrade process was tested on both platforms, and we documented the
package dependancies in detail them as well.
While this project was being developed, Midgard was
in flux. A new PHP version had been released that addressed important security bugs,
but the released Midgard did not compile against it. The automated install was used to
test different Apache/PHP versions and compile flags against
Midgard nightly snapshots.
We went trough three stages. We used release 1.4.2 for a while. As the deadline was looming, we settled on a "known good" midgard CVS snapshot. Thanks to luck (and Torben!), an official Midgard release happened before the project was due.
To address the versioning issues, developers were
using different Apache/PHP/Midgard combinations. They had to be able to
compile their own set and run it, without having root privileges, and
inside their home directories.
To achieve this, we had to
patch some of the configure and shell scripts so that they (a) link to
the private libraries and not the system-wide libraries and (b) accept
to run as non-root. We could not achieve (a) completely: if there was a
system-wide libmidgard.so, we could not prevent midgard-php4 and others
from linking to it. The solution was to remove system-wide PHP and
Midgard installs on the development server.
Midgard's install process needs to be enhanced on this front. However, fixing the issues mentioned is nontrivial, as Midgard depends largely on a PHP's configure and install process which is complex and unstable. Our attempts to come up with good patches to the existing mkall and configure.in scripts failed.
We developed a series of scripts to help
integration with CVS of repligard's exported XML files.
After exporting, they:
- order objects on their "id" parameter (GUID).
- change objects' "changed" propery to 0
- fix newlines using mac2unix and dos2unix (we have a true multiplaform environment)
- change the values of "name" and "port" elements of "host" objects to a standard string
On import, they
- change objects' "changed" property to the current timestamp
- change the values of "name" and "port" elements of "host" objects to match the local configuration
In a nutshell, things that are volatile or particular to an installation or
working copy are not allowed to go into CVS.
These issues are apparently addressed by YAMP as well. We are analyzing whether to integrate YAMP to our process.
One aspect that we found to be critical is the choice of how to organize your repligard exports. In this project we tried to separate in different files:
- "pages" and "pageelements"
- "styles" and "styleelements"
- "groups" and "persons"
- "topics" and "articles"
- "snippets" and "snippetdirs"
This approach was not completely successful. Using repligard it is hard to segregate objects in this way, and we could not find a way to modify the repligard.xml file so that it would link objects we want to segregate without including them in the exported file.
We are now experimenting with an approach consisting of a 'per-host' aggregate of the styles, styleelements, pages and pageelements, segregated from a 'content' file, that contains a skeleton of the 'topics' and 'articles' the client will maintain. As there are no explicit links between topics/articles and pages/styles, we hope this model works better, while still allowing us to 'upgrade' the code on a live system without overwriting the content the client has developed.
Repligard is an excellent tool. Making it more customizable -- by enriching the syntax available or rewriting it in PHP or Perl -- is certainly an important step to raise its value as 'glue'.
Benefits of the infrastructure
Once this infrastructure was in place, we used it to support other
aspects of the project. Our CVS repository would export out a nightly
snapshot, and a script on a public server would run the whole install
process. The "daily build" approach is a powerful tool in project management, giving a daily feedback to all the parties involved. In our case, the client could either use the daily build from our server, or have the the nightly snapshot set up by the hosting company.
The daily build flushed the previous day database, and removed the code
completely. We could have run them all in paralell instead, had we
As the project matured, we added regression tests using Perl's
HTTP::WebTest module and added the regression tests to the daily build process. Thus, regressions in the core application were tested daily, and the results available to the team.
The nightly release/daily build included a Changelog that was easily accessible, so all parties involved could monitor the evolution of the project. We were also running a shared Bugzilla where bugs, major features and requests for enhancements were tracked. Commit messages would usually include
and a link to the relevant bugzilla page.
Mix and match
We used Midgard as a framework to provide certain services. However, a
significant portion of the application was not based on Midgard, although it did use the templating system.
Keeping track of critical code inside
Midgard Snippets and trying to edit it through a web-based interface was
considered risky and impractical.
Instead, we decided we would develop the code as regular files, following
Midgard's sensible approach of separating business logic
('code-init') from display logic ('content').
For each case (URL) where we decided to develop outside Midgard, we built a
set of three files, which were accessible directly through Apache with no Midgard
intervention. The files were called code-init.php, content.php and
wrapper.php. The wrapper file would (a) do anything we were doing in
code-global, like include some libraries, (b) include code-init.php (c)
include content.php -- thus emulating a what Midgard's page object would do.
We split the team working on non-Midgard application logic
from the team working on Midgard and the CMS. The first team retained the ability to work in
their editors and debug a complex application without having to deal
with the complexity overhead of Midgard. Indeed, they still drop back to their thin
wrappers to develop and fix.
Once both sides were mature, the integration was straightforward.
However, it did happen that the team working on Midgard took
longer than expected to be ready to integrate. Thanks to this approach, the delay had no impact on the other team.
The team working on Midgard used the web-based interfaces
"Old Admin" and Asgard. Being advanced users who are efficient in their
preferred editors these tools forced them to use impractical and time
consuming <textfields>. We are now exploring using PHPmole.
Probably due to the use of <textfields>, the codebase of the available administration interfaces has a horrible coding style. Patching the administration interfaces elegantly is impossible. We are worried of the potential nightmare of porting our patches an 'upgraded' admin.
Given that one of Midgard's killer applications is to provide custom administration interfaces, I would
personally like to cost-justify applying PEAR standards to the administration interfaces. That will dramatically reduce the cost and risk of maintaining a 'client branch' and merging it with the 'vendor branch'.
Other changes that would be welcome on the administration interfaces have to do with not
requiring REGISTER_GLOBALS, and being clean of warnings.
Working with Midgard
The setup for this project consists of two virtualhosts, each mapping to a database: one hosts the
'staging' environment and the other the 'live' environment. The live
environment is 'read-only', no changes are made to the database other than repligard imports.
Changes are made on the staging database and are replicated to live. We
only replicate to live articles that have been approved after they have
been edited ( approved > edited ). This is accomplished by running a perl
script that fudges the repligard table before running the repligard
The repligard export is goes to a temp file that is then imported into the live database. This is all run in a crontab,
and output goes to a log.
We found that if the import fails for any reason, it is hard to re-run
it properly to restore the live DB. Object deletions can be replayed safely. However, objects edited after the failed replication, but not yet approved, won't be replicated, as that 'correct' version is no
longer in the database. Note that this is a problem scenario particlar to our implementation of the QA process, and not due to Midgard's infrastructure.
We also found that due to a minur bug, object deletions were not being
replicated because their entry in the repligard table was wrong. A patch
was available on Midgard's mailing list and the mantis bugtracker, and thus applied
as part of the install process. This made apparent that not many Midgard
sites are using staging/live replication.
For certain restricted groups of objects, namely articles belonging to
certain topics, we have set up Version Control using NemeinRCS. Editors
can see the article's log, retrieve old versions and roll back.
Granularity in VC is a key issue. We have implemented commits on
every approval, as opposed to commits on every form submission, but this is something that must be assessed for each project.
It has to be noted, though, that using RCS means your data storage is
now the mysql database plus the RCS files. Replicating the database
(either with mysql utilities, repligard or plain old cp) is no longer
enough. This has implications for backup/restore as well. Plan to
change your backup and replication scripts to match the new situation.
In the end the client has found that the customized interface we have
provided for content management does not completely meet their needs. They have moved on to importing data from an RSS source they manage.
Early on we decided to be avoid building URLs using article IDs or names, as they were too volatile.
This was initially regarded as unnecessarily
paranoid. However, halfway through the project the client indicated they
would run several installations transparently, switching around them
(and/or load-balancing) as required, and cross replicating data as they
saw fit. Linking using GUIDs saved the day more than once since, even using it
just for articles. Topic and page names seem to be far less volatile, and
the client is aware that they will break links by changing them.
The website hosts content in English and Maori. The Maori language uses vowels with
macrons, and Unicode UTF-8 was our choice of codepage to deal with it. Part of the challenge was
ensuring proper storage, retrieval, version control and display of data
with high Unicode characters.
Midgard's secret weapon seems to be its pan-european user base:
Midgard's 'Russian' mode is UTF-8. After setting it, Midgard's internal calls to entities() or
its C equivalent were Unicode-clean. We only had to ensure that our own calls to PHP's entities() had the
correct parameters for UTF-8.
Repligard and related tools worked
perfectly with Unicode, and CVS and RCS kept track of code and data with consistency.
Performance & Reliability
We have found performance and reliability to be excellent. Midgard is
based on a fast and reliable infrastructure, and does not disappoint. We
unfortunately do not have performance measurements, although we know
our client has commissioned aggressive stress tests on the setup.
Informal feedback indicates performance was good. We would certainly
know if it was not.
We have found Midgard to be a reliable platform for
development, and are now exploring new tools to make it more efficient
and practical. PHPMole, YAMP and MidCOM are on our list.
And yes, we are definitely using it again, and again. Life is too short to reinvent the framework, and Midgard is already an outstanding framework.
But there are still some things to do...
The infrastructure work we had to do up front points to a
framework that needs its tools to mature a bit. I am tempted to think
that is is a matter of a better, more comprehensive, install approach,
as the tools themselves work without hitches.
It should be simple to install the whole
toolbox, or at least to identify the core tools. The Midgard site can do
a lot more helping new people discover not only how to install Midgard, but
also "what is the toolbox that efficient Midgard users/teams use?" and "how
do the tools fit together"?
Having every all the tools on a single site would help enormously. The Midgard beginner has to find his way through many sites, each devoted to a single tool, with few links across them and almost no 'big picture' documents
showing how it all fits together. The mailing lists hold heaps of information and insights for those who are patient, but I think the commuity would benefit a lot from bringing those insights into the light.
Last, but not least, the community around Midgard is its greatest
asset. The likes of Emiliano, Torben, Henri, Piotras, Sergei and many
others make things possible. Thanks for all the help, patience and code.