Copyright Dr John Maddock 1998-2002 all rights reserved.


DFA: deterministic finite state automaton.





NFA: non-deterministic finite state automaton.





The best known DFA regular expression library is by Henry Spencer, at
ftp.zoo.toronto.edu/pub/





Matching back references are almost certainly NP-complete, and therefore almost certainly require exponential time to match in the worst case.





More STL references and the STL port can be found
here.

 

Regex++

A portable template regular expression library.

Boost-version-1.30.0

Regex++ provides a set of portable template classes and functions for compiling, matching and searching for regular expressions.

  • Supports the full regular expression syntax, including back references, sub-expression matching, character classes, repeat expressions and word/line boundaries. This version supports the full POSIX syntax including multi-character collating elements and equivalence classes, as well as some of Perl's features like non-greedy repeats, forward lookahead asserts, and non-marking parenthesis.
  • Provides full Unicode support if your run-time library or operating system supports it. Under Win32, uses native Windows NT Unicode support, and degrades gracefully under Win9x to provide more limited wide character support.
  • Full run-time localisation support, either via your run time library (and setlocale) or natively under Win32.
  • Highly customisable via a traits class, define your own traits class to change almost any behaviour of the underlying regular expression syntax. For example custom traits classes can define their own character classes, or handle case conversion and character comparisons according to their own rules.
  • Provides a POSIX compatible C API for those who prefer to use it, in both narrow and wide character versions.
  • Provides a high level class for simplified usage, for those who don't need the power of the lower level template classes.
  • Provides a full API reference, and regular expression syntax reference.
  • Template based match/search algorithms can search any bi-directional iterator type, allowing seamless searching of non-contiguous data, as well as more traditional searches of const char* or const wchar_t* types.
  • Supports search and replace operations via a format string that transforms the results of a match into a new string. This can be used to transform whole strings in one go in the manner of Perl.
  • Supports perl-style splitting of matches.
  • Integrates with your existing STL implementation: The library will automatically recognise the Hewlett-Packard, Silicon Graphics, Rogue Wave, and Microsoft STL versions, or you can use it independently of any STL if you prefer (this applies to the 2.x version only - the 3.x version requires a reasonably compliant C++ standard library).
  • Speed, this library uses branch prediction techniques to speed up algorithm performance, for example the time takes to match the expression "^([0-9]+)(\-| |$)(.*)$" against the string "100- this is a line of ftp response which contains a message string" are: BSD implementation 450 micro seconds, GNU implementation 271 micro seconds, regex++ 127 micro seconds (Pentium P90, Win32 console app under MS Windows 95).
  • Demo apps: The library comes with three demo applications, a version of grep, a timer program to time the efficiency of any regular expressions you write, and a regression test program to verify the correctness of the underlying algorithms. The regression test program is by far most important of these; if by far the least interesting.
  • Directly supported compilers: Borland C++ Builder 4 5 and 6, gcc 2.95 (Cygwin, Linux and BSD), and Microsoft Visual C++ 6 and 7. It is also reported to build with Sun Workshop 6.1, Kai C++, SGI Irix C++, Compaq true64, and HP aCC. If you have an older or less capable compiler for example Visual C++ 5 (service pack 3), Sunpro 5 or HP aCC then you may want to try version 2.25 of the library which requires fewer advanced compiler features.

You can download the latest version (based on boost-1.30.0) from here, and also view the documentation online.

Important note for upgraders: this version of the library is part of the peer-reviewed boost library, as a result the library interface has been somewhat redesigned and modernised. Check out the notes for upgraders for compatibility with version 2.x.

regexpp3.zip(1249K)

regexpp3.tar.gz(687K)

Note that the boost-dependencies have grown sufficiently complicated that these files have grown in size quite dramatically, I'll try and do better in future releases.

There is also a C++ Builder wrapper component that is based on this library - from Tropic Software East.

Version 2.25 of the library is still available for those that have less capable compilers:

regexpp2.zip(241)

regexpp2.tar.gz(189)


Regular expression libraries use a variety of differing algorithms all of which have their pro's and con's, which can make it hard to choose the best implementation for your purpose. This library uses an NFA algorithm which is dedicated to determining fast and accurate sub-expression matches, as well as providing support for features like back-references, which are hard to support using DFA algorithms.

People who should use this library:

  • Anyone who needs to use wide character strings.
  • Anyone who needs to search non-contiguous data.
  • Anyone who wants fast sub-expression matching.
  • Anyone who wants to customise the regular expression behaviour, or localise the library to a non-English locale.

People who should look to an alternative DFA based library:

  • Anyone who doesn't care about sub-expression matching, and
  • Wants to search only ANSI-C strings.

Back to home page....


Copyright Dr J. Maddock.

Last Updated: 17th May 2002.