Regular Expressions: Programming Down to the Silicon
by Cameron Laird and Kathryn Soraiz
Assembly language can be high-level, too. That's what Randall Hyde promotes, and we think he's mostly right.
All about x86
Just this fall, No Starch Press published Hyde's massive The Art of Assembly Language. While the title, the author's background with a variety of processor architectures, and the advertisement on the back page, "The Most Comprehensive Guide to Assembly Language", all suggest an encyclopedic approach, in fact this book focuses strictly on 32-bit x86 coding. "Comprehensive" here means that the author does a good job of illustrating general principles of assembly-language practice with a detailed examination of the concrete case of an assembler he's written over the past decade for x86. The book originated as an electronic edition, which he's refined over and over.
Why should we care, though? "Regular Expressions" is about "high-level" languages our focus is on abstraction and portability and project-level software engineering. What place in that does assembly language take?
A significant one, we answer. It's a good time to be working in the abstract assembly languages of the virtual machines (VM) that now commonly implement high-level languages Perl 6 is innovating in several directions with its Parrot VM; PHP, Python, Tcl, and other languages all are adding in VM introspective abilities; and .NET has legitimized cross-language VM deployment. Moreover, as the most interesting computing moves away from the desktop to mobile, embedded devices, there's frequently a need to program hardware that existing languages support only to a primitive extent. We plan for "Regular Expressions" to revisit assembly-language topics during 2004.
Even restricting attention to x86 leaves at least a couple of important connections to our usual topics. In this column, we regularly advocate a "dual-level" approach to programming an approach that exploits two different implementation languages to gain advantages from each. One exciting way to get the most out of hardware, particularly in the field of high-performance computing (HPC), is to develop algorithms and architecture in a high-level language, but manually code time-critical segments in assembly language. In at least some situations, this approach beats pure C++ or Fortran codings in both execution speed and development time. The final chapter of Hyde's book is one he titles, "Mixed-Language Programming".
Moreover, Hyde demonstrates that assembly itself can be "high-level". That's the title of his system, in fact HLA (High Level Assembler). By this, he means that "HLA has one of the most powerful macro processing facilities of any computer language processing system."
HLA still leaves programmers to their own devices for memory management. We see this as a critical limit to usability, and one of the reasons it's best to combine HLA with a scripting language for development of substantial end-user applications.
A final reason for our affection for HLA is its "light-weightness". It takes only a few minutes to download and unpack a current binary HLA distribution. It takes only a few minutes more to write and run your first HLA program. Although the package works quite nicely under Linux and other x86 Unix, the first chapters of The Art of Assembly Language concentrate on Windows. Here's what it takes to begin under Linux:
When invoked this way, HLA also generates a low-level source,
- Unpack it.
- Set a few environment variables:
- Write a simple HLA program source:
cat > my_program.hla << HERE
stdout.put("Hello, world of HAL.", nl);
- Compile the source into an executable:
- Execute the resulting program:
my_program.asm, with a few dozen lines of "boilerplate" surrounding the specific content of this program:
push offset L1000_str__hla_
call STDOUT_PUTS /* puts */
call STDOUT_NEWLN /* newln */
Hyde's macro system is the strongest feature of HLA. The Art of Assembly Language devotes all of Chapter 10 to its use and benefits. Macros simplify, for example, management of unrolled loops:
?i := 0;
#while(i < 10)
stdout.put("my_array[", i, "] = ", my_array[i * 4], nl);
?i := i + 1;
expands to a sequence of ten
Our own work with HLA is still experimental. We've already found a few cases, though, in which readable HLA source executes almost twice as fast as the corresponding C-coded programs. That's easily enough benefit to make us glad to have HLA in our toolbox.
Next month, we'll present another assembly-language approach. Rather than targeting the most popular computing platform, however, it's aimed at unusual and specialized architectures. Thanks for your patience while "Regular Expressions" was on vacation during the past few months; we're looking forward to sharing the stories that have piled up on our desks during this interval.
Kathryn and Cameron run their own consultancy, Phaseit, Inc., specializing in high-reliability and high-performance applications managed by high-level languages. Join them each month as their "Regular Expressions" column explores issues and opportunities that arise in practical application development with scripting languages.