The *Support Vector Machine *is a new type of learning machine
for pattern recognition and regression problems which constructs its solution
in terms of a subset of the training data, the *Support Vectors*.
This page collects some material that might be helpful for people interested
in Support Vector Machines: a bibliography, a list of
people working on Support Vectors, and a brief discussion
of Support Vectors in view-based object recognition.

*In these two books, the original "Generalized Portrait"
Algorithm for constructing separating hyperplanes with optimal margin is
described. *

*These papers use Mercer kernels to generalize from optimal hyperplanes
to nonplanar decision surfaces. This is done by nonlinearly mapping into
some other (possibly high-dimensional) space.*

*In this paper, the optimal margin algorithm is generalized to non-separable
problems by the introduction of slack variables in the statement of the
optimization problem.*

*This paper reports that different SV classifiers constructed by using
different kernels (polynomial , RBF, neural net) extract the same Support
Vectors. In addition, it is empirically shown that VC dimension arguments
can be used to predict the optimal degree for polynomial kernels.*

*A very good high-level introduction in Statistical Learning Theory
in the VC formulation, plus a comprehensive overview of the SValgorithm.
First description of SV machines for regression estimation.*

*High-level summary of some aspects of learning theory and SV machines
(in German).*

*Proposes the "Reduced Set Method" , which speeds up SV
machines by representing the SV solution in terms of a smaller number of
vectors.*

*Application of SV classifiers to chair recognition, in a performance
comparison (SV outperforms Neural Net and Oriented Filter approach).*

*The first successful attempt to improve SV accuracy by incorporating
domain knowledge, using the "Virtual SV" method.*

*Applies the kernel method to unsupervised algorithms as for instance
Principal Component Analysis. This gives a principled and efficient approach
to nonlinear PCA.*

*A combination of the Reduced Set and Virtual SV methods leads to
a fast high-accuracy classifier.*

*First implementation of SV regression, and first discussion of spline
kernels.*

*Reports empirical results on SV regression.*

*Generalizes the SV approach to a wider range of cost functions, and
establishes a link between regularization operators and SV kernels.*

*Formally proves the first decomposition training algorithm for SVMs.
Application of SV learning to a large-scale Computer Vision problem (Face
Detection).*

*Formulates a decomposition algorithm for training SVMs with large
numbers of SVs.*

*Uses SV regression on chaotic time-series (Mackey-Glass, Ikewda Map
and Lorenz) and compares with other techniques reported (Casdalgi).*

*SV regression with epsilon-insensitive and Huber loss functions.
Experimental results on time-series prediction (Mackey-Glass and Santa
Fe competition data set D, with a new record on the latter).*

*Drop me a line* *if
you have any additions to this list.*

Kristin Bennett (Rensselaer Polytechnic Institute, NY)

Volker Blanz (MPI für biologische Kybernetik, Tübingen)

Leon Bottou (AT&T Research, Holmdel, NJ)

Chris Burges (Bell Laboratories, Holmdel, NJ) (Bell Labs SV project)

Harris Drucker (Bell Laboratories, Holmdel, NJ)

Federico Girosi (MIT, Cambridge, MA)

Thorsten Joachims (Uni Dortmund)

Ulrich Kressel (Daimler-Benz AG, Ulm)

Klaus-Robert Müller (GMD First, Berlin)

Edgar Osuna (MIT, Cambridge, MA)

Bernhard Schölkopf (MPI für biologische Kybernetik, Tübingen)

Alex Smola (GMD First, Berlin)

Mark Stitson (Royal Holloway and Bedford College, London) ( SV project )

Vladimir Vapnik (AT&T Research, Holmdel, NJ)

*Drop me a line* *if
you want to be included in this list.*

*Learning* can be thought of as inferring regularities from a set
of training examples. Much research has been devoted to the study of various
learning algorithms which allow the extraction of these underlying regularities.
If the learning has been successful, these intrinsic regularities will
be captured in the values of some parameters of a learning machine; for
a polynomial classifier, these parameters will be the coefficients of a
polynomial, for a neural net they will be weights and biases, and for a
radial basis function classifier they will be weights and centers. This
variety of different representations, however, conceals the fact that no
matter how different the outward appearance of these algorithms is, they
all must rely on intrinsic regularities of the data. The *Support Vector
Learning Algorithm* (Boser, Guyon & Vapnik, 1992, Cortes & Vapnik,
1995) is a promising tool for studying these regularities (i.e. for studying
learning) in pattern classification:

- It allows the construction of various learning machines by the choice of different dot products. Thus the influence of the set of functions that can be implemented by a specific learning machine can be studied in a unified framework.
- It builds on results of statistical learning theory, namely on the structural risk minimization principle (Vapnik, 1979) guaranteeing high generalization ability. Thus there is reason to believe that decision rules constructed by the support vector algorithm do not reflect incapabilities of the learning machine (as in the case of an overfitted artificial neural network) but rather regularities of the data.

Experimental results (Schölkopf, Burges & Vapnik, 1995) showed that for the case of handwritten digit recognition,

- in addition to polynomial classifiers, the support vector algorithm also allows the construction of radial basis function classifiers (as in the 2-dimensional example in the above picture; support vectors are marked by extra circles) and two-layer perceptrons, leading to similar performance, and
- the three different types of classifiers, obtained by choosing different
kernel functions
*K*(see picture below) construct their decision functions from almost identical subsets of the training set, their*Support Vector Sets*(for a 7300 digit training set, the support vector sets of the different classifiers - about 250 vectors in size - showed an overlap of more than 80%). Training the classifiers on the support vector set of another classifier yielded approximately the same performance as training on the full training set.

The support vector set allows the incorporation of transformation invariance into SV classifiers, significantly improving accuracy (Schölkopf, Burges & Vapnik, 1996). Together with the "reduced set method" this yields fast high-accuracy classifiers (Burges & Schölkopf 1997).

With its principled way of extracting statistically critical examples
to represent classes, support vector learning may provide a theoretical
foundation for exemplar-based approaches to object recognition (Bülthoff
& Edelman, *Proc. Natl. Acad. Sc. *89:60-64, 1992). We have applied
support vector machines to object recognition (Blanz et al., 1996); for
benchmarking, our set
of images of rendered chair models is available on our ftp server.
Future work will include psychophysical experiments studying the relevance
of support vectors in human visual learning.

Last modified: 10 June 1997 bs@mpik-tueb.mpg.de