Unix Review > Archives > 2005 > June 2005
Print-Friendly Version

UnixReview.com
June 2005

Regular Expressions: Giving New Life to Legacy Code Using SWIG

by Miki Tebeka and Cameron Laird

"Regular Expressions" has written several times before about the Simplified Wrapper and Interface Generator (SWIG) tool for connecting program segments written in different languages. This month, we'll look in detail at an example of how SWIG breathes new life into legacy code.

Motivation

Scene One

You convince your boss that programming in Python (or Perl, Tcl, LISP, Java) will make you and your team much more productive. "But what about all the code we already have that's in C? Do you expect us to rewrite years of work?" You know his instincts are right, but still smile and say "SWIG".

Scene Two

Or your clients wonders, "Frankly, we can't afford coming to you every time the simulator needs a new feature. How do we add functionality without the high costs?" Again, you smile and say "SWIG".

Scene Three

"Told you!", says your co-worker. "Python (or Perl, Tcl, LISP, Java) is not fast enough. It was good for most of the code but this part need to be faster." You smile and say "SWIG".

All of the scenes above are from our own experience; SWIG easily solved them all.

What is SWIG?

SWIG is a bridge between the worlds of C/C++, and higher-level languages. The current 1.3.25 release conveniently exposes code written in C/C++ to Allegro Common Lisp, Chicken Scheme, C#, Guile Scheme, Java, Modula3, MzScheme, Ocaml, Perl, PHP, Pike, Python, Ruby, S-Expressions, Tcl and XML. There also is preliminary work on extending SWIG to a few other languages, including Lua. Notice that each of these higher-level languages has its own "native API". Each of them requires language-specific information, though, and several of them are cumbersome or confusing.

Using SWIG

Background

Scene two, above, nicely exemplifies use of SWIG. The employer of this month's lead columnist, Zoran Corporation, had a legacy hardware simulator written in C and C++. Every time a simulator user wanted another feature — dumping memory to disk, for example — she had to ask us to implement this feature. Zoran "re-engineered" this process by:

  • Exposing the core functionality of the simulator to Python;
  • Embedding a copy of Python in the simulator; and
  • Documenting Python as an "extension language" for the simulator.
These changes gave users far more freedom and flexibility, for at this point, they could write such simple scripts as:
	def memdump(filename, start, end):
		'''Dump memory to disk'''
		fo = open(filename, "wt")
		for address in range(start, end + 1):
			fo.write("0x%04X: 0x%04X\n" % (address, sim.mem_read(address)))
		fo.close()

to manage their own needs.

The Simulator

Let's assume our simulator has the following interface:

    #ifndef SIM_H
    #define SIM_H
    /* The real source is far more complicated.  What follows is
        an instructive model based on the real one. */
    
    /** Initialize the simulator */
    extern int sim_initialize();
    /** Terminate the simulator */
    extern int sim_terminate();
    /** Execute one step */
    extern int sim_step();
    
    /** Registers */
    enum {
        REG_PC, /**< Program counter */
        REG_FLAGS, /**< Flags */
        REG_GENERAL1, /**< General register 1 */
        REG_GENRAL2, /**< General register 2 */
        NUM_REGISTERS /**< Number of registers */
    };
    
    /** Write to register
      @param reg Register to write to
      @param value Value to write
    */
    extern void reg_write(int reg, int value);
    
    /** Read from register
      @param reg Register to write to
      @return 1 on success, 0 otherwise
    */
    extern int reg_read(int reg);
    
    /** Write value to memory
      @param address Address to write to
      @param value Value to write
    */
    extern void mem_write(int address, int value);
    
    /** Read from memory
      @param address Address to read from
      @return Memory value at address
    */
    extern int mem_read(int address);
    
    /** Instruction data type */
    typedef struct {
        long code; /**< Instruction code */
        char *disassembly; /**< Disassemly string */
        int reg1; /**< First instruction register, -1 for none */
        int reg2; /**< Second instruction register, -1 for none */
    } inst_t;
    
    /** Get instruction
      @param address Program address
      @return Instruction at address (e.g. get_inst(REG_PC) will return current
              instruction)
    */
    extern inst_t *
    get_inst(int address);
    
    #endif /* SIM_H */

Without SWIG, the only way to enhance the functionality this interface represents was for the development team to code in C and deliver a new executable to the customer. Let's see how SWIG changes that approach.

Writing Interface File

To use SWIG, a developer constructs an "interface file", which specifies exposed C functions. The syntax of this specification is much like that of C.

The current interface file for the simulator is:

    %module sim
    %{
    #include "sim.h"
    %}
    
    %rename(initialize) sim_initialize;
    %rename(terminate) sim_terminate;
    %rename(step) sim_step;
    /* sim exported functions here */
    %include "sim.h"

When first starting with SWIG, it's simplest to write a .i's function definitions "by hand". For production use, Zoran relies on makefiles with %include to synchronize our .i and .h sources.

Most SWIG examples, including those of the standard tutorial, duplicate the function definition with an extern prefix in the interface definition file. However, it's much easier to modify your .h file to follow this convension and just use SWIG's %include directive. This way you don't have to synchronize the .i and .h files by hand.

Note that extern is crucial when writing function or variable declarations.

Creating Wrappers

The next step in use of SWIG is generation of language-specific "glue" between the legacy C source and our high-level language. For our choice of Python,

    swig -python sim.i

accomplishes this by creating sim_wrap.c.

Building

Python's standard tool for managing development and maintenance of extensions is distutil. Along with all the other intelligence it captures, distutil knows about SWIG and its interface files in particular. This means that all that's left to build a distribution-ready extension is to write a simple setup.py:

# Setup script for sim C extension

from distutils.core import setup, Extension

sim = Extension(
        "_sim", # Name of output library
        ["sim.i"], # SWIG file
        libraries = ["sim"], # Link with simulator library
        library_dirs = ["."], # Where to find the simulator library
)

setup(ext_modules = [sim])
and run:
	 python setup.py build_ext -i

This builds the _sim.pyd Python extension and sim.py wrapper. The -i directs setup to create _sim.pyd in the local directory, where Zoran needs it, rather than an installation-dependent qualified directory that is the default.

Playing with the Extension

With the extension properly built, its use requires only a simple import sim in Python source to access the module:

    Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sim
    >>> # The dir() built-in reports on a module's attributes.
    ...
    >>> dir(sim)
    ['NUM_REGISTERS', 'REG_FLAGS', 'REG_GENERAL1', 'REG_GENRAL2', 'REG_PC',             \
        '__builtins__', '__doc__', '__file__', '__name__', '_newclass', '_object',      \
        '_sim', '_swig_getattr', '_swig_setattr', '_swig_setattr_nondynamic',           \
        'get_inst', 'inst_t', 'inst_tPtr', 'mem_read', 'mem_write', 'reg_read',         \
        'reg_write', 'sim_initialize', 'sim_step', 'sim_terminate']
    >>> sim.mem_write(10, 100)
    >>> sim.mem_read(10)
    100
    >>> inst = sim.get_inst(8)
    >>> inst.code
    0
    >>> inst.disassembly
    'nop'
    >>> sim.REG_FLAGS
    1
    >>>
SWIG introduces a few "intrastructure" attributes — _siwg_setattr and so on — beyond the scope of this introduction. The exciting part, though, is to see our legacy C-coded functions, such as mem_write() and mem_read(), show up in perfect order as recognized Python functions.

Advanced Feature

SWIG does far more than just wrap a few C entry points. Especially important for the simulator are error handling and renaming.

Error Handling

In C++, an uncaught exception terminates an application. We need, therefore, to wrap our legacy object code to "transfer" these C++ exceptions to Python. SWIG's %exception makes this easy. We need only add the following to sim.i before including the simulator functions.

    %exception {
        try {
            $action
        }
        catch (SimError e) {
            e.print();
            PyErr_SetString(PyExc_RuntimeError, "Simulation Error");
            return NULL;
        }
        catch (...) {
            PyErr_SetString(PyExc_RuntimeError, "Unspecified Error");
            return NULL;
        }
    }
Note that in order to create C++ wrapping, you need to provide the -c++ switch to swig, as in:
    swig -python -c++ sim.i

which will produce sim_warp.cxx and sim.py.

Renaming

Sometimes functions names in C/C++ collide with reserved names in Python. SWIG's %rename feature manages such conflicts. Here's how, for instance, we encapsulate the sim_ prefix in the sim namespace Python automatically provides on importation of the sim module: Add the following before the function definitions:

    %rename(initialize) sim_initialize;
    %rename(terminate) sim_terminate;
    %rename(step) sim_step;
Rebuilding with these additions gives:
    Python 2.4.1 (#65, Mar 30 2005, 09:13:57) [MSC v.1310 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sim
    >>> dir(sim)
    ['NUM_REGISTERS', 'REG_FLAGS', 'REG_GENERAL1', 'REG_GENRAL2', 'REG_PC',         \
        '__builtins__', '__doc__', '__file__', '__name__', '_newclass',             \
        '_object', '_sim', '_swig_getattr', '_swig_setattr',                        \
        '_swig_setattr_nondynamic', 'get_inst', 'initialize', 'inst_t',             \
        'inst_tPtr', 'mem_read', 'mem_write', 'reg_read', 'reg_write',              \
        'sim_step', 'terminate']
    >>>

Conclusion

SWIG is a great utility for use in a high-level language of legacy code — that is, "stuff that already works" — written in C/C++. This column has shown how easy and rewarding it is to start with SWIG. Our thanks to David Beazley, SWIG's originator, for his timely creation, as well as to SWIG's other contributors. It's important to know, moreover, that SWIG has advanced features that make it excellent for a variety of production and one-time uses beyond the range mentioned in this column. Download your own copy of SWIG, and you'll soon see how inviting it is.

Miki Tebeka is a tool developer for Zoran Corporation. Cameron Laird, vice president of Phaseit, Inc., has co-authored "Regular Expressions" for seven years.

Sys Admin Spotlight

CMP DevNet Spotlight

Highlighting Multiple Search Keywords in ASP.NET
This article demonstrates how to highlight a multiple keywords within a DataGrid control, no matter where they are in the text.

In the News

Windows Server 2008 RC1 Available For Download
Microsoft will launch the final version of the server OS at a February 27th event in Los Angeles.


Google Revamps iPhone Interface
Google's iPhone home screen makes services such as Gmail, Calendar, and Reader more accessible through the use of Ajax menu tabs.


CradlePoint Offers Personal HotSpot To Go

CradlePoint today announced a new personal Wi-Fi hotspot product that you can carry with you everywhere you go. Have hotspot, will travel.


Mom Faces Massive File-Sharing Fines After DOJ Sides With RIAA
Justice Department prosecutors argue that Jammie Thomas' $222,000 judgment is in line with the U.S. Copyright Act law.


France Telecom Sells 30,000 iPhones, Some Unlocked
Orange expects to sell up to 100,000 iPhones by the end of the year and between 400,000 and 500,000 by the end of 2008.


Palm Refreshes Treo 750 With Windows Mobile 6
The update includes the ability to send and receive e-mails formatted in HTML with tables, bullets, and colored text.


Microsoft Turns To Inkblots For Password Generation
The image associations are not only unique to the user, they're also "hard to forget," the researchers said.


CD-ROM

Sys Admin and The Perl Journal CD-ROM version 11.0

Version 11.0 delivers every issue of Sys Admin from 1992 through 2005 and every issue of The Perl Journal from 1996-2002 in one convenient CD-ROM!

Order now!




MarketPlace

Easy & Powerful Server Monitoring that Just Works
Fortune 500 clients include Financial, Healthcare & Telco Companies. Free Trial Download.

Government (GSA) Contract Services
Expand into a $43 billion market today. Government (GSA) contract assistance, $4800 fee. BBB member.

WinDev 11 - Powerful IDE
Develop 10 times faster ! ALM, IDE, .Net, RAD, 5GL, Database, 5GL, 64-bit, etc. Free Express version

We Buy & Sell Used Cisco
Hula Networks is overstocked on many items including, used Cisco, Juniper, Foundry and Extreme networking equipment and can therefore offer outstanding pricing. We buy Cisco and sell Cisco networking equipment

Wanna see your ad here?