Papers on Smart Pointers in C++ and Evolutionary Delivery are also available at this site, as is a Tiny Primer on the STL. For more information on the author, please take a look at my software and home pages.


Effective STL

David Harvey

© 1996 David Harvey. All rights reserved.
Revised 27/6/1997

The STL is the containers, iterators and algorithms component of the proposed C++ Standard Library [ANSI95]. It represents a novel application of principles which have their roots in styles of programming other than Object-orientation.

This paper formed part of a tutorial session on the STL at Object Expo Europe in 1996. By taking a simple programming problem through several elaborations it demonstrates a range of idioms and techniques, which are stated as principles for effective use of the STL.

1. Introduction

All software libraries are characterised by their shape. This shape arises from both the static, structural characteristics of the library, such as the way API functions are grouped or the inheritance relationships of class components exposed, and its dynamic properties - its modes of use, the way a developer interacts with its interfaces. Libraries come in different shapes, not to say sizes, and the shape of a library has a profound effect on its usability and ease of assimilation.

The STL is a relatively small library which achieves a remarkable degree of reuse through its basis in the principles of generic programming and its use of C++ templates. Because of this, it has a particularly clear shape. The distinction between containers, iterators and algorithms is its most striking structural feature: dynamically, the way a container delivers iterators which are then used by algorithms is a consistent and fundamental pattern of use.

However, it is still easy to misuse the STL. In this paper we will consider a simple programming task, along with successively more idiomatic solutions. As we progress, we will justify some guiding principles in the effective use of the STL.

2. Naïve use of STL components

Here is a mixed C++/pseudocode specification for a function which reads strings from a stream, sorts them, removes duplicates and prints the result to a second stream.

void listWords(istream& in, ostream& out)
{
        string s;

        while (!in.eof() && in >> s) {
                add s to some container
        }

        sort the strings in the container
        remove the duplicates

        for (each string t in container) {
                out << t; 
        }
}

For now, assume that a word is defined as a whitespace-separated string as delivered by the stream extraction operator. Later on we will consider ways of refining this definition.

Given the way this problem is expressed, we can implement this program directly, if naïvely. The STL container class vector will suffice to hold the words: applying the algorithms sort and unique provides the required result.

void listWords(istream& in, ostream& out)
{
        string s;
        vector<string> v;

        while (!in.eof() && in >> s)
                v.push_back(s);                 // (1)

        sort(v.begin(), v.end());

        vector<string>::iterator e 
                = unique(v.begin(), v.end());   // (2)

        for (vector<string>::iterator b = v.begin();
                b != e;
                b++) {
                out << *b << endl;
        }
}

At (1) the vector member function push_back() is used to add to the end of the vector. This can also be done using the insert member, which takes as a parameter an iterator identifying the position in the vector at which to place the added element:

        v.insert(v.end(), s);

This allows us to add at any position in the vector. Be aware, though, that adding anywhere other than the end implies the overhead of physically shifting all elements from the insertion point to the end to make room for the new value. For this reason, and given the choices made in this example, attempts to optimise this code by maintaining the vector in sorted order are unwise. Replace vector with list and this becomes possible - although in both cases a search over the container will be necessary to determine the correct position of insertion.

The unique algorithm has the surprising property of not changing the length of the container to which it is applied (it can hardly do this, as it has access not to the underlying container, but only to the pair of iterators it is passed). Instead, it guarantees that duplicates are removed by moving unique entries towards the beginning of the container, returning an iterator indicating the new end of the container. This can be used directly (as here, at (2)), conversely it can be passed to the erase member with the old end iterator, to truncate the container.

3. An alternative container

It is possible to avoid having to separately remove duplicates and sort the collection of strings by selecting a more appropriate container. A set is distinguished by limiting its membership to unique values: duplicates will simply not be stored. What is more (and perhaps counter-intuitively) in the STL a set is maintained in sorted order (it is a Sorted Associative Container), so both parts of our requirement are solved by using a set:

void listWords(istream& in, ostream& out)
{
        string s;
        set<string> mySet;

        while (!in.eof() && in >> s)
                mySet.insert(s);        // (1)

        for (set<string>::iterator b = mySet.begin();
                b != mySet.end();
                b++)    {
                out << *b << endl;
        }
}

This illustrates an important STL principle:

Use the right container

The flexibility of the STL can catch out the unwary programmer: just because most of the algorithm implementations work with all containers, this is not to say that they are interchangeable. The storage, retrieval and performance characteristics of all STL components are precisely defined - indeed, the library is unique in offering this level of detail in its specification and documentation. Be aware of these characteristics, and use the most appropriate combination of STL container and algorithms for the task at hand

4. Stream and iterator adaptors

Instead of extracting and inserting strings explicitly from the passed streams, we can use the stream adaptors to make the streams appear as STL containers, and the copy algorithm to manage the process of building the sorted collection. Because STL algorithms are template functions specialised on the types of the iterators returned by the underlying containers it is perfectly reasonable to use the copy algorithm to move the contents of one container to another of a different type. Our function is now reduced to a declaration and a pair of function calls:

void listWords(istream& in, ostream& out)
{
        set<string> mySet;

        copy(istream_iterator<string, ptrdiff_t>(in),
                istream_iterator<string, ptrdiff_t>(),
                inserter(mySet, mySet.begin()));

        copy(mySet.begin(),
                mySet.end(),
                ostream_iterator<string>(out, "\n"));
}

When applying an algorithm like copy we give up the responsibility for inserting each element into the destination container. Instead, we must provide an input iterator into which the element will be placed (and the iterator incremented). This means that the size and structure of the destination container must be pre-established to conform to the source, which is more than inconvenient in the present example. The solution is to use the iterator adaptors which the STL provides. The template functions inserter, back_inserter and front_inserter return instances of appropriately specialised inserter adaptor template classes, depending on the argument iterator and container. Instead of replacing elements in the underlying containers, these add elements at the specified position, forcing the container to grow as necessary. There are few circumstances where you would want to copy over a pre-built container, so remember to

Use inserter adaptors in algorithms to populate containers

The introduction of stream adaptors in this version of the function gives us the opportunity to state another STL principle:

Integrate with the STL by writing adaptors

You can make any existing class with sequence-like characteristics look like an STL collection simply by writing an iterator adaptor class. Wrap a tokeniser, a parser, a database query, a sequence of frames in an animation, a stream of MIDI data - it's really not hard. The code for the STL-provided stream adaptors is concise and self-explanatory, and is a good place to start to develop your own.

5. Transforming values

By hand-coding the iteration over the input it is clearly possible to apply transformations - such as converting strings to upper case - to each element of the input before it is placed in the container. Although we cannot do this using the STL copy algorithm, there is another template function in the library which will achieve this for us. The transform algorithm applies a transformation to each value returned from a source iterator before applying it to the destination. Once again, the source and destination containers do not need to be of the same type, and we will use this characteristic to maintain the uniqueness and sorted constraints of the output.

Because the algorithm template functions are specialised on the type of the function being applied, regular C++ function pointers can be passed. The following version of the code uses the standard library to_lower function to return a lower-case copy of the passed string argument.

void listWords(istream& in, ostream& out)
{
        set<string> mySet;

        transform(istream_iterator<string, ptrdiff_t>(in),
                istream_iterator<string, ptrdiff_t>(),
                inserter(mySet, mySet.begin()),
                to_lower);

        copy(mySet.begin(),
                mySet.end(),
                ostream_iterator<string>(out, "\n"));
}

As in the previous example, we have a function with only two statements which is doing a great deal of work. Underlying this degree of conciseness is another key to using the STL effectively:

Deal with ranges and algorithms

and its corollary

Avoid direct manipulation of iterators

It is important to have the flexibility to access the STL containers directly through their iterators, and to work with these iterators at a fine-grained level. But in many cases it is much more effective to use the algorithms provided by the library: these are optimal solutions with well-defined performance characteristics.

This technique of applying a function to each element in a collection in a single expression is characteristic of applicative programming. This has long been familiar to Lisp programmers, with mapcar and similar functions.

6. Selecting values

Just as we can transform the values in a container, we can also perform filtering operations. Specifically, the in-place algorithms remove and replace have copying versions remove_copy and replace_copy, and all four have predicate versions (remove_if, replace_if, remove_copy_if, replace_copy_if) which perform the respective action if a passed predicate function returns true for a value.

Let's try to use this to list words of four letters or more from the input stream. To achieve this we need to use remove_copy_if, to remove elements from the destination with less than four letters. We could pass a pointer to a specially-written predicate function:

bool lessThanFour(const string& arg)
{
        return arg.length < 4;
}
...
remove_copy_if(istream_iterator<string, ptrdiff_t>(in),
                        istream_iterator<string, ptrdiff_t>(),
                        inserter(mySet, mySet.begin()),
                        lessThanFour);

If, later in the program, we need to filter out strings of a different length, we will have to write another function. And as it stands, there is no way of parameterising the predicate.

However, all that the algorithm requires of its predicate argument is that it can be called as a function with an argument of the iterator's value type, and returning a bool. An appropriate function pointer will certainly qualify, but so does an object for which operator() is overloaded with the correct argument and return types. What is more, the function object can contain other members: for example, a length against which to compare the passed string.

class stringLenLT : public unary_function<string, bool> {
public:
        SCLen(int i) : limit(i) {}
        bool operator()(const string& arg) const
        {
                return arg.length() < limit;
        }
private:
        int limit;
};

void listWords(istream& in, ostream& out)
{
        set<string> mySet;

        remove_copy_if(
                        istream_iterator<string, ptrdiff_t>(in),
                        istream_iterator<string, ptrdiff_t>(),
                        inserter(mySet, mySet.begin()),
                        stringLenLT(4));

        // Copy from set direct to output stream
        copy(mySet.begin(),
                mySet.end(),
                ostream_iterator<string>(out, "\n"));
}

The STL implements numerous function class templates, and provides mechanisms for creating complex function classes and instances from simpler ones. Effective use of the STL rests on an understanding of functional abstraction as much as the traditional object view (which is data abstraction): the principle is

Abstract function as well as data

Function objects are an extremely powerful feature of the STL. They are as near as we get in C++ to the closures of languages like Lisp and Smalltalk. A closure associates a function with some of the environment in which it was created: in C++, this amounts to providing that information as constructor parameters.

7. Filter and transform

What happens if we want to transform and filter the values in a container? We could perform this in two separate steps, with an intermediate container: for performance we might consider going back to a hand-crafted loop. But rather than solve the problem just once, we can make the solution generic by providing an algorithm to encapsulate this capability. A striking characteristic of the STL algorithms is their simplicity (as many of them must work with input iterators, there is only a small number of operations applicable to their parameterised iterator classes). Using the existing transform algorithm as a model, the modification to filter a value before applying a transformation is straightforward:

template <class InputIterator,
                        class OutputIterator,
                        class UnaryOperation,
                        class UnaryPredicate>
OutputIterator transform_if(InputIterator first,
                                        InputIterator last,
                                        OutputIterator result,
                                        UnaryOperation op,
                                        UnaryPredicate test)
{
        while (first != last) {
                if (test(*first))
                        *result++ = op(*first);
                ++first;
        }
        return result;
}

The principle here is fundamental to generic programming:

Solve a problem for generic types, not specific types

It is often as easy (and sometimes easier) to develop a general (and reusable) solution to a problem as it is to tackle a single case. Proponents of HOP sometimes put it differently:

Solve a class of problems rather than a single problem

8. Objects or Values?

The STL is designed to operate on values, not objects. What does this mean? Think of the distinction between a built-in programming language type, such as integer, and a programmer-defined class used to represent a real-world entity, such as Employee. When we deal with integers in our programs, we store them as values in variables. When one variable is assigned to another, the value is copied. If the variable appears in an expression which changes its value, then the old value is replaced by a new and distinct value.

When we deal with classes such as Employee, things are different. Such objects are generally addressed by means of references stored in variables (in C++, as references or pointers). Assigning one variable to another entails sharing a reference to the same object - typically, objects are never directly copied or assigned. Operating on an object changes its state, but not its identity.

All this makes it less straightforward to use objects in connection with the STL. At first glance, using STL containers parameterised on C++ pointers might appear to offer a solution. The catch in using this approach is that a pointer is itself a value: default versions of algorithms and container classes will happily compare and assign pointers as values, so it is easy to end up with collections in haphazard orders, and an uncontrollable proliferation of pointers to the same object, dramatically increasing the risk of dangling pointer references.

In a perfect world, automatic garbage collection would solve this problem for us. This being C++, we have to manage this ourselves, so the best we can usually do is providing a wrapper class, either specific to an object class or (more commonly) a generic reference-counted smart pointer. Several implementations of this idiom have been published: one can be found in [HARV94]. Here is a fragment of code implementing an address/telephone database, using an object variable wrapper:

// ObjVar is a template wrapper which provides reference
// counting and automatic deletion for wrapped objects.
// It implements "==" and "<" which make it useable as
// a value in an STL container. 
// The wrapped entry class implements its comparison
// operators in terms of a "key" name field

class book {
public:
        typedef set<ObjVar<entry>, 
                          less<ObjVar<entry> > > setOfEntry;
        typedef setOfEntry::iterator iterator;

        void add(ObjVar<entry> aEntry);
        void remove(ObjVar<entry> aEntry);
        ObjVar<entry> find(string aName) const
        // more interface...

private:
        setOfEntry entries;
};

...

void book::add(ObjVar<entry> aEntry)
{
        entries.insert(aEntry);
}

void book::remove(ObjVar<entry> aEntry)
{
        entries.erase(aEntry);
}

ObjVar<entry> book::find(string aName) const
{
        // entry equality defined as equal names
        ObjVar<entry>temp(new entry(aName, "","",""));
        ObjVar<entry> null;

        iterator p = entries.find(temp);
        return p == entries.end() ? null : *p ;
}

Removal of an entry (in book::remove()) does not need to explicitly delete the entry: it will still be referenced, after all, by the caller in code such as the following:

extern book& theBook();  // singleton book
extern ui& theUi();      // singleton ui

...

void removeNameHandler()
{
        string s = theUi().prompt("Name to remove");
        ObjVar<entry> e = theBook().find(s);
        if (!e.isNull)
                theBook().remove(e);
        else
                theUi().alert(s + " not found");
}

The ObjVar object variable referencing the entry to remove goes out of scope at the end of this function - assuming no other object variables reference the selected entry, it is only at this point that the underlying entry object is deleted.

For classes with value semantics, it is not necessary to go to these lengths to store instances in STL containers. This means that you should:

Store values directly in STL containers

However, an efficient wrapper or smart-pointer implementation is pretty much essential for effective use of objects in STL:

Use wrappers or smart pointers to put objects into STL containers

Note that the standard library's auto_ptr template class [ANSI95] will not serve this purpose, as it enforces a strict single-reference semantics.

9. Summary

The STL is a happening phenomenon. Sadly, it hasn't finished happening yet. Although the reference implementation of the library has been available for some time [HP95], and commercial offerings are becoming commonplace (from ObjectSpace, Modena, Rogue Wave, amongst others), the poor template capabilities of most (if not all) current C++ compilers means that using these is an exercise in compromise - at least for now.

In spite of this, the STL is the most significant enhancement of C++ since the language itself was born from 'C with Classes'. It brings applicative and generic programming styles decisively into C++, taking advantage of the C++ template mechanism to make available some of the flexibility and power available to programmers in dynamic and reflexive languages such as Lisp and Smalltalk. The clarity of the STL's specification is an object lesson in library design: if you follow the principles outlined in this paper you will respect its shape and benefit from its strengths.

10. References

[ANSI95] ANSI/ISO Working Paper for Draft Proposed International Standard for Information Systems-Programming Language C++, Doc No: X3J16/95-0087 WG21/N0687, April 1995. Available in postscript and acrobat formats from ftp://research.att.com/dist/c++std/WP/

[HARV95] D. Harvey, 'A Smart Pointer implementation in C++'. Unpublished paper (1995). Available at http://web.ftech.net/~honeyg/articles/smartp.htm

[HARV96] D. Harvey, 'Template Idioms in the C++ Standard Library'. Tutorial presented at Object Technology 96 conference, Oxford, England (March 1996). An introduction to the STL formed part of the tutorial material: this is available at http://web.ftech.net/~honeyg/articles/tiny_stl.htm

[HP95] HP reference implementation of STL. Available by anonymous ftp from ftp://butler.hpl.hp.com/stl/. Also includes STL-style implementations of hashed collections, STL-related papers and documents, and sample programs from ObjectSpace's STL<Toolkit>.

[KHAN95] M. Khan, 'Mumit's STL Newbie Guide'. Notes (not just for beginners) on effective use of the STL (1995). Available at http://www.xraylith.wisc.edu/~khan/software/stl/STL.newbie.html

[KREFT96] K.Kreft, A.Langer, 'Iterators in the Standard Template Library', C++ Report, July 1996

[MUSS96] D.R.Musser, A.Saini, STL Tutorial and Reference Guide, Addison-Wesley, 1996. ISBN 0-201-63398-1. Currently the best introduction and reference to the STL.

[STROU94] B. Stroustrup, 'Making a vector fit for a standard', C++ Report, October 1994.

[VILO94] M. J. Vilot, 'An Introduction to the Standard Template Library', C++ Report 6/viii, October 1994.