The Support Vector Machine is a new type of learning machine for pattern recognition and regression problems which constructs its solution in terms of a subset of the training data, the Support Vectors. This page collects some material that might be helpful for people interested in Support Vector Machines: a bibliography, a list of people working on Support Vectors, and a brief discussion of Support Vectors in view-based object recognition.
In these two books, the original "Generalized Portrait" Algorithm for constructing separating hyperplanes with optimal margin is described.
These papers use Mercer kernels to generalize from optimal hyperplanes to nonplanar decision surfaces. This is done by nonlinearly mapping into some other (possibly high-dimensional) space.
In this paper, the optimal margin algorithm is generalized to non-separable problems by the introduction of slack variables in the statement of the optimization problem.
This paper reports that different SV classifiers constructed by using different kernels (polynomial , RBF, neural net) extract the same Support Vectors. In addition, it is empirically shown that VC dimension arguments can be used to predict the optimal degree for polynomial kernels.
A very good high-level introduction in Statistical Learning Theory in the VC formulation, plus a comprehensive overview of the SValgorithm. First description of SV machines for regression estimation.
High-level summary of some aspects of learning theory and SV machines (in German).
Proposes the "Reduced Set Method" , which speeds up SV machines by representing the SV solution in terms of a smaller number of vectors.
Application of SV classifiers to chair recognition, in a performance comparison (SV outperforms Neural Net and Oriented Filter approach).
The first successful attempt to improve SV accuracy by incorporating domain knowledge, using the "Virtual SV" method.
Applies the kernel method to unsupervised algorithms as for instance Principal Component Analysis. This gives a principled and efficient approach to nonlinear PCA.
A combination of the Reduced Set and Virtual SV methods leads to a fast high-accuracy classifier.
First implementation of SV regression, and first discussion of spline kernels.
Reports empirical results on SV regression.
Generalizes the SV approach to a wider range of cost functions, and establishes a link between regularization operators and SV kernels.
Formally proves the first decomposition training algorithm for SVMs. Application of SV learning to a large-scale Computer Vision problem (Face Detection).
Formulates a decomposition algorithm for training SVMs with large numbers of SVs.
Uses SV regression on chaotic time-series (Mackey-Glass, Ikewda Map and Lorenz) and compares with other techniques reported (Casdalgi).
SV regression with epsilon-insensitive and Huber loss functions. Experimental results on time-series prediction (Mackey-Glass and Santa Fe competition data set D, with a new record on the latter).
Drop me a line if you have any additions to this list.
Kristin Bennett (Rensselaer Polytechnic Institute, NY)
Volker Blanz (MPI für biologische Kybernetik, Tübingen)
Leon Bottou (AT&T Research, Holmdel, NJ)
Chris Burges (Bell Laboratories, Holmdel, NJ) (Bell Labs SV project)
Harris Drucker (Bell Laboratories, Holmdel, NJ)
Federico Girosi (MIT, Cambridge, MA)
Thorsten Joachims (Uni Dortmund)
Ulrich Kressel (Daimler-Benz AG, Ulm)
Klaus-Robert Müller (GMD First, Berlin)
Edgar Osuna (MIT, Cambridge, MA)
Bernhard Schölkopf (MPI für biologische Kybernetik, Tübingen)
Alex Smola (GMD First, Berlin)
Mark Stitson (Royal Holloway and Bedford College, London) ( SV project )
Vladimir Vapnik (AT&T Research, Holmdel, NJ)
Drop me a line if you want to be included in this list.
Learning can be thought of as inferring regularities from a set of training examples. Much research has been devoted to the study of various learning algorithms which allow the extraction of these underlying regularities. If the learning has been successful, these intrinsic regularities will be captured in the values of some parameters of a learning machine; for a polynomial classifier, these parameters will be the coefficients of a polynomial, for a neural net they will be weights and biases, and for a radial basis function classifier they will be weights and centers. This variety of different representations, however, conceals the fact that no matter how different the outward appearance of these algorithms is, they all must rely on intrinsic regularities of the data. The Support Vector Learning Algorithm (Boser, Guyon & Vapnik, 1992, Cortes & Vapnik, 1995) is a promising tool for studying these regularities (i.e. for studying learning) in pattern classification:

Experimental results (Schölkopf, Burges & Vapnik, 1995) showed that for the case of handwritten digit recognition,
The support vector set allows the incorporation of transformation invariance into SV classifiers, significantly improving accuracy (Schölkopf, Burges & Vapnik, 1996). Together with the "reduced set method" this yields fast high-accuracy classifiers (Burges & Schölkopf 1997).

With its principled way of extracting statistically critical examples to represent classes, support vector learning may provide a theoretical foundation for exemplar-based approaches to object recognition (Bülthoff & Edelman, Proc. Natl. Acad. Sc. 89:60-64, 1992). We have applied support vector machines to object recognition (Blanz et al., 1996); for benchmarking, our set of images of rendered chair models is available on our ftp server. Future work will include psychophysical experiments studying the relevance of support vectors in human visual learning.