MARC import/export plugins for GNU EPrints

After a few days of work, I’ve finally published the MARC import/export plugins for the GNU EPrints3 archiving software. Thanks goes to Ailé Filippi, who wrote the MARC to EPrints to MARC again mappings!

Writing software that uses MARC records is not easy. There are two mainstream MARC flavours: MARC21 and UNIMARC. Therefore, the software must allow the collection administrator to specify any correspondence between EPrints metadata and MARC fields. This must happen in both MARC input and output, therefore developers must aim for centralized configuration means (Koha, for example, does this in dedicated DB tables)

The other problem is that libraries introduce separators in MARC fields so the data recovered through public catalogues pops with the right format. Therefore, when working with MARC developers should trim and add those characters accordingly. It helps if you have a librarian as your girlfriend.

Finally, MARC mappings are never one-to-one, since MARC has repeatable fields/subfields and replicated values all around the record. This means that, in Perl, you need to handle data mapping using hashes of array references or hashes of hash references. That’s some beautiful code.

The MARC import/export plugins for GNU EPrints3 were developed with these considerations in mind:

  • Writing the least quantity of code for data processing; that is, do not work on each field with punctuation and special characters, since this would require additional logic.
  • Reuse code. For example, the whole MARC-XML support for the plugins uses the MARC::File::XML module from CPAN in order to work with traditional MARC21 records.
  • Centralized configuration. Administrators can use marc.pl in order to specify new mappings, and even write his/her own field processing routines.

Other interesting applications I experimented with were using the (already written) Dublin Core import/export plugins for GNU EPrints 3 and Library of Congress’ unqualified Dublin Core to MARC21 and back crosswalks. I could even use the already written CPAN module for that!

dsamon — monitoreando Debian Security Advisories

dsamon es un script (en Perl, Por Supuesto) que contacta al feed RSS de Debian Security y compara con un estado previo (o genera el estado inicial si no hay un estado previo :) Si hay nuevas advertencias, imprime los enlaces de descarga de los paquetes binarios para una arquitectura en particular. Es ideal para aplicar actualizaciones de seguridad en repositorios a la medida o en distribuciones basadas y compatibles con Debian. Se puede correr desde cron y pasarle los enlaces a wget para una conveniente descarga. Todos los módulos que utiliza están disponibles en Debian, incluyendo HTML::LinkExtractor, mantenido por un venezolano.

Several stuff

WebGUI and DateTime::Cron::Simple

If you use WebGUI and feel concerned about the removal of DateTime::Cron::Simple (due to license, or more precisely, non-license issues promptly noticed by Ernesto Hernández-Novich), you can use DateTime::Cron::Parser, which has a compatible API. Since more people were interested in rewriting the module, an official discussion at the DateTime mailing list is going on, and therefore this module does not aim to be an official replacement. Some people point out that DateTime::Event::Cron might do what’s needed here with a little tweaking and patching. Choose what you need.

MiniDebConf in Venezuela

If you are subscribed to debian-devel-announce, you might have read (eng) about the Venezuelan Minidebconf, taking place from October 14th. in Maturin. While this is not the first event we organize (we’ve celebrated the Debian Day two years in a row, nation-wide) we’re pretty excited about having international visitors participating with us in a joint effort to promote collaboration with Debian. If you’re interested in following the event, you can take a look at the Wikipage (eng). Every comment and collaboration is greatly appreciated.

Más sobre el problema de los registros

Este es el tiempo del generador del archivo de prueba, escrito en Perl.

real    0m0.053s
user    0m0.041s
sys     0m0.009s

-rw-r--r-- 1 jose jose 477733 2006-05-31 07:41 archivo.txt
jose @ maxwell /tmp $  wc -l archivo.txt
56334 archivo.txt

Esta es la solución de Milton, en Perl. Noten que le agregué un espacio pues en el correo original el formato del archivo no incluía tabs sino espacios muy bien contados. También pudimos usar \s.

real    0m0.151s
user    0m0.129s
sys     0m0.017s

Esta es mi solución. La salida, sin embargo, no genera líneas en blanco de más como él dice. Es limpia.

real    0m0.188s
user    0m0.168s
sys     0m0.011s

Ernesto Hernández-Novich dió una solución en Perl aprovechando el concepto de registros, y éstos son los tiempos:

real    0m0.065s
user    0m0.045s
sys     0m0.013s

La solución en bash de De Sousa no me funciona, por alguna razón. Ayer sí funcionaba (WTF) con un archivo más pequeño. Los tiempos de ejecución, sin embargo, son mayores. La otra solución en bash, de Javier García, tampoco me funciona (pueden sacar los tiempos ustedes?). La solución en C de De Sousa también deja espacios en blanco en el archivo:

real    0m0.087s
user    0m0.054s
sys     0m0.009s