AutoClass III -- An unsupervised Bayesian classification program in Lisp Contents
Introduction
Contacts
Current Work
Future Work
References (and gif files)
AutoClass X -- An experimental extension of AutoClass III in Lisp
AutoClass C -- A public domain version of AutoClass III in C
Introduction
In previous years, the Bayes group at Ames Research Center developed the basic
theory and associated algorithms for various kinds of general data analysis
techniques. Our earliest efforts were applied to the problem of automatic
classification of data. We implemented this theory in the Autoclass series
of programs. AutoClass takes a database of cases described by a combination
of real and discrete valued attributes, and automatically finds the natural
classes in that data. It does not need to be told how many classes are
present or what they look like -- it extracts this information from the data
itself. The classes are described probabilistically, so that an object can
have partial membership in the different classes, and the class definitions
can overlap. AutoClass generates reports on the classes it has found at the
end of its search. AutoClass has been used and tested on many data sets, both
within NASA and by industry, academia and other agencies. These applications
typically find surprising classifications that show patterns in the data
unknown to the user. Examples include: discovery of new classes of infra-red
stars in the IRAS Low Resolution Spectral catalogue (see figure below; and
see here and
here for more
information), new classes of airports in a database of all USA airports,
discovery of classes of proteins, introns and other patterns in DNA/protein
sequence data, and others.
=- [ Return to AutoClass Project Contents ] -=