Midgard Midgard
Midgard | References | Manual | Download | News | Discussion | Developer | Search | FAQ
   

Case study: Midgard framework in action

Nov 25th 2002 -- Martin Langhoff from CWA New Media has posted the following case study on using the Midgard Content Management Framework in a large-scale project.

A client of CWA New Media has recently made live a site we have developed using Midgard as the underlying framework. The project overall involved 3 companies, responsible for the back-end, front-end and hosting. This three teams, plus a sizable team put together by the client, worked for over a year from prototype to launch date.

The site is a high profile government portal with an extensive search system that is driven by an extension of the Dublin Core metadata standard, called NZGLS. Additional search services is provided by the Autonomy search engine. A custom protocol, extremely similar to XML-RPC, glues front-end with back-end.

Our team was in charge of developing the front end, which included some CMS functionality. We decided to use Midgard as a framework, and we also provided a simplified CMS inteface based on 'OldAdmin'. The team involved a project manager, one art director/designer, one designer, one html developer, one architect/programmer and four programmers. Not everybody was working on it all the time, but for a good 5 months we had a core team of 6 working full time on it.

Infrastructure

After the initial pilot stage, where we did not use Midgard at all, a lot of time was spent in setting up infrastructure to support the project. The core issues addressed were:

Install

The hosting is handled by a different company. This meant we had to provide install scripts and instructions that worked perfectly to ensure smooth deployment. If deployment was problematic, it would be hard to sort out responsibilities with the hosting company.

A makefile was put together that would compile apache/php/midgard from source, configure them, load our application, and configure it as well. It was later enhanced to handle upgrades while preserving user data.

Our development servers were Debian 2 (Potato), and target servers were RedHat 7.2. As a result, the install and upgrade process was tested on both platforms, and we documented the package dependancies in detail them as well.

Midgard version

While this project was being developed, Midgard was in flux. A new PHP version had been released that addressed important security bugs, but the released Midgard did not compile against it. The automated install was used to test different Apache/PHP versions and compile flags against Midgard nightly snapshots.

We went trough three stages. We used release 1.4.2 for a while. As the deadline was looming, we settled on a "known good" midgard CVS snapshot. Thanks to luck (and Torben!), an official Midgard release happened before the project was due.

Private sandboxes

To address the versioning issues, developers were using different Apache/PHP/Midgard combinations. They had to be able to compile their own set and run it, without having root privileges, and inside their home directories.

To achieve this, we had to patch some of the configure and shell scripts so that they (a) link to the private libraries and not the system-wide libraries and (b) accept to run as non-root. We could not achieve (a) completely: if there was a system-wide libmidgard.so, we could not prevent midgard-php4 and others from linking to it. The solution was to remove system-wide PHP and Midgard installs on the development server.

Midgard's install process needs to be enhanced on this front. However, fixing the issues mentioned is nontrivial, as Midgard depends largely on a PHP's configure and install process which is complex and unstable. Our attempts to come up with good patches to the existing mkall and configure.in scripts failed.

CVS integration

We developed a series of scripts to help integration with CVS of repligard's exported XML files.

After exporting, they:

  • unzip
  • order objects on their "id" parameter (GUID).
  • change objects' "changed" propery to 0
  • fix newlines using mac2unix and dos2unix (we have a true multiplaform environment)
  • change the values of "name" and "port" elements of "host" objects to a standard string

On import, they

  • change objects' "changed" property to the current timestamp
  • change the values of "name" and "port" elements of "host" objects to match the local configuration

In a nutshell, things that are volatile or particular to an installation or working copy are not allowed to go into CVS.

These issues are apparently addressed by YAMP as well. We are analyzing whether to integrate YAMP to our process.

One aspect that we found to be critical is the choice of how to organize your repligard exports. In this project we tried to separate in different files:

  • "pages" and "pageelements"
  • "styles" and "styleelements"
  • "groups" and "persons"
  • "topics" and "articles"
  • "snippets" and "snippetdirs"
  • "hosts"
  • "sitegroups"

This approach was not completely successful. Using repligard it is hard to segregate objects in this way, and we could not find a way to modify the repligard.xml file so that it would link objects we want to segregate without including them in the exported file.

We are now experimenting with an approach consisting of a 'per-host' aggregate of the styles, styleelements, pages and pageelements, segregated from a 'content' file, that contains a skeleton of the 'topics' and 'articles' the client will maintain. As there are no explicit links between topics/articles and pages/styles, we hope this model works better, while still allowing us to 'upgrade' the code on a live system without overwriting the content the client has developed.

Repligard is an excellent tool. Making it more customizable -- by enriching the syntax available or rewriting it in PHP or Perl -- is certainly an important step to raise its value as 'glue'.

Benefits of the infrastructure

Once this infrastructure was in place, we used it to support other aspects of the project. Our CVS repository would export out a nightly snapshot, and a script on a public server would run the whole install process. The "daily build" approach is a powerful tool in project management, giving a daily feedback to all the parties involved. In our case, the client could either use the daily build from our server, or have the the nightly snapshot set up by the hosting company.

The daily build flushed the previous day database, and removed the code completely. We could have run them all in paralell instead, had we feared regressions.

As the project matured, we added regression tests using Perl's HTTP::WebTest module and added the regression tests to the daily build process. Thus, regressions in the core application were tested daily, and the results available to the team.

The nightly release/daily build included a Changelog that was easily accessible, so all parties involved could monitor the evolution of the project. We were also running a shared Bugzilla where bugs, major features and requests for enhancements were tracked. Commit messages would usually include and a link to the relevant bugzilla page.

Mix and match

We used Midgard as a framework to provide certain services. However, a significant portion of the application was not based on Midgard, although it did use the templating system.

Keeping track of critical code inside Midgard Snippets and trying to edit it through a web-based interface was considered risky and impractical.

Instead, we decided we would develop the code as regular files, following Midgard's sensible approach of separating business logic ('code-init') from display logic ('content').

For each case (URL) where we decided to develop outside Midgard, we built a set of three files, which were accessible directly through Apache with no Midgard intervention. The files were called code-init.php, content.php and wrapper.php. The wrapper file would (a) do anything we were doing in code-global, like include some libraries, (b) include code-init.php (c) include content.php -- thus emulating a what Midgard's page object would do.

We split the team working on non-Midgard application logic from the team working on Midgard and the CMS. The first team retained the ability to work in their editors and debug a complex application without having to deal with the complexity overhead of Midgard. Indeed, they still drop back to their thin wrappers to develop and fix.

Once both sides were mature, the integration was straightforward. However, it did happen that the team working on Midgard took longer than expected to be ready to integrate. Thanks to this approach, the delay had no impact on the other team.

The team working on Midgard used the web-based interfaces "Old Admin" and Asgard. Being advanced users who are efficient in their preferred editors these tools forced them to use impractical and time consuming <textfields>. We are now exploring using PHPmole.

Probably due to the use of <textfields>, the codebase of the available administration interfaces has a horrible coding style. Patching the administration interfaces elegantly is impossible. We are worried of the potential nightmare of porting our patches an 'upgraded' admin.

Given that one of Midgard's killer applications is to provide custom administration interfaces, I would personally like to cost-justify applying PEAR standards to the administration interfaces. That will dramatically reduce the cost and risk of maintaining a 'client branch' and merging it with the 'vendor branch'.

Other changes that would be welcome on the administration interfaces have to do with not requiring REGISTER_GLOBALS, and being clean of warnings.

Working with Midgard

The setup for this project consists of two virtualhosts, each mapping to a database: one hosts the 'staging' environment and the other the 'live' environment. The live environment is 'read-only', no changes are made to the database other than repligard imports.

Changes are made on the staging database and are replicated to live. We only replicate to live articles that have been approved after they have been edited ( approved > edited ). This is accomplished by running a perl script that fudges the repligard table before running the repligard export.

The repligard export is goes to a temp file that is then imported into the live database. This is all run in a crontab, and output goes to a log.

We found that if the import fails for any reason, it is hard to re-run it properly to restore the live DB. Object deletions can be replayed safely. However, objects edited after the failed replication, but not yet approved, won't be replicated, as that 'correct' version is no longer in the database. Note that this is a problem scenario particlar to our implementation of the QA process, and not due to Midgard's infrastructure.

We also found that due to a minur bug, object deletions were not being replicated because their entry in the repligard table was wrong. A patch was available on Midgard's mailing list and the mantis bugtracker, and thus applied as part of the install process. This made apparent that not many Midgard sites are using staging/live replication.

For certain restricted groups of objects, namely articles belonging to certain topics, we have set up Version Control using NemeinRCS. Editors can see the article's log, retrieve old versions and roll back. Granularity in VC is a key issue. We have implemented commits on every approval, as opposed to commits on every form submission, but this is something that must be assessed for each project.

It has to be noted, though, that using RCS means your data storage is now the mysql database plus the RCS files. Replicating the database (either with mysql utilities, repligard or plain old cp) is no longer enough. This has implications for backup/restore as well. Plan to change your backup and replication scripts to match the new situation.

In the end the client has found that the customized interface we have provided for content management does not completely meet their needs. They have moved on to importing data from an RSS source they manage.

Early on we decided to be avoid building URLs using article IDs or names, as they were too volatile. This was initially regarded as unnecessarily paranoid. However, halfway through the project the client indicated they would run several installations transparently, switching around them (and/or load-balancing) as required, and cross replicating data as they saw fit. Linking using GUIDs saved the day more than once since, even using it just for articles. Topic and page names seem to be far less volatile, and the client is aware that they will break links by changing them.

The website hosts content in English and Maori. The Maori language uses vowels with macrons, and Unicode UTF-8 was our choice of codepage to deal with it. Part of the challenge was ensuring proper storage, retrieval, version control and display of data with high Unicode characters.

Midgard's secret weapon seems to be its pan-european user base: Midgard's 'Russian' mode is UTF-8. After setting it, Midgard's internal calls to entities() or its C equivalent were Unicode-clean. We only had to ensure that our own calls to PHP's entities() had the correct parameters for UTF-8.

Repligard and related tools worked perfectly with Unicode, and CVS and RCS kept track of code and data with consistency.

Performance & Reliability

We have found performance and reliability to be excellent. Midgard is based on a fast and reliable infrastructure, and does not disappoint. We unfortunately do not have performance measurements, although we know our client has commissioned aggressive stress tests on the setup.

Informal feedback indicates performance was good. We would certainly know if it was not.

Conclusion

We have found Midgard to be a reliable platform for development, and are now exploring new tools to make it more efficient and practical. PHPMole, YAMP and MidCOM are on our list.

And yes, we are definitely using it again, and again. Life is too short to reinvent the framework, and Midgard is already an outstanding framework.

But there are still some things to do...

The infrastructure work we had to do up front points to a framework that needs its tools to mature a bit. I am tempted to think that is is a matter of a better, more comprehensive, install approach, as the tools themselves work without hitches.

It should be simple to install the whole toolbox, or at least to identify the core tools. The Midgard site can do a lot more helping new people discover not only how to install Midgard, but also "what is the toolbox that efficient Midgard users/teams use?" and "how do the tools fit together"?

Having every all the tools on a single site would help enormously. The Midgard beginner has to find his way through many sites, each devoted to a single tool, with few links across them and almost no 'big picture' documents showing how it all fits together. The mailing lists hold heaps of information and insights for those who are patient, but I think the commuity would benefit a lot from bringing those insights into the light.

Last, but not least, the community around Midgard is its greatest asset. The likes of Emiliano, Torben, Henri, Piotras, Sergei and many others make things possible. Thanks for all the help, patience and code.

   

Case study: Midgard framework in action

Midgard tutorial in German

NemeinNavBar library released

All news items