Bioinformatics
Z.R. Yang, Z. Yang, in Comprehensive Biomedical Physics, 2014
6.01.1 Introduction
Artificial intelligence algorithms have long been used for modeling decision-making systems as they provide automated knowledge extraction and high inference accuracy. Artificial neural networks (ANNs) are a class of artificial intelligence algorithms that emerged in the 1980s from developments in cognitive and computer science research. Like other artificial intelligence algorithms, ANNs were motivated to address the different aspects or elements of learning, such as how to learn, how to induce, and how to deduce. For such problems, ANNs can help draw conclusions from case observations and address the issues of prediction and interpretation.
Strictly speaking, most learning algorithms used by ANNs are rooted in classical statistical pattern analysis. Most of them are based on data distribution, unlike rough set algorithms (Komorowski, Chapter 6.02). ANNs introduce a new way to handle and analyze highly complex data. Most ANN algorithms have two common features. First, its network is composed of many artificial neurons that are mutually connected. The connections are called parameters and learned knowledge from a data set is then represented by these model parameters. This feature makes an ANN model similar to a human brain. Second, an ANN model typically does not make any prior assumptions about data distribution before learning. This greatly promotes the usability of ANNs in various applications.
The study of ANNs has undergone several important stages. In the early days, ANN studies were mainly motivated by theoretical interests, that is, investigating whether a machine can replace human for decision-making and pattern recognition. The pioneering researchers (McCulloch and Pitts, 1943) showed the possibility of constructing a net of neurons that can interact to each other. The net was based on symbolic logic relations. This earlier idea of McCulloch and Pitts was not theoretically rigorous as indicated by Fitch (1944). Later in 1949, Hebb gave more concrete and rigorous evidence of how and why the McCulloch–Pitts model works (Hebb, 1949). He showed how neural pathways are strengthened once activated. In 1954, Marvin Minsky completed his doctorial study on neural networks and his discussion on ANNs later appeared in his seminal book (Minsky, 1954). This was instrumental in bringing about a wide-scale interest in ANN research. In 1958, Frank Rosenblatt built a computer at Cornell University called the perceptron (later called single-layer perceptron (SLP)) capable of learning new skills by trial and error through mimicking the human thought process. However, Minksy (1969) demonstrated its inability to deal with complex data; this somewhat dampened ANN research activity for many subsequent years.
In the period of 1970s and 1980s, ANN research was in fact not completely ceased. For instance, the self-organizing map (SOM) (Kohonen, 2001) and the Hopfield net were widely studied (Hopfield, 1982). In 1974, Paul Werbos conducted his doctorial study at Harvard University on a training process called backpropagation of errors; this was later published in his book (Werbos, 1994). This important contribution led to the work of David Rumelhart and his colleagues in the 1980s on the backpropagation algorithm, implemented for supervised learning problems (Rumelhart and McClelland, 1987). Since then, ANNs have become very popular for both theoretical studies and practical exercises.
In this chapter, we focus on two particular ANN models – Rumelhart's multilayer perceptron (MLP) and Kohonen's SOM. The former is a standard ANN for supervised learning while the latter for unsupervised learning. Both adopt a trial-and-error learning process. MLP aims to build a function to map one type of observation to another type (e.g., from genotypes to phenotypes) and SOM explores internal structure within one data set (genotypic data only).
In contrast to Rosenblatt's SLP, Rumelhart's MLP introduces hidden neurons corresponding to hidden variables. An MLP model is in fact a hierarchical composition of several SLPs. For instance, let us consider a three-layer MLP for mapping genotypes to phenotypes. If we have two variables x1 and x2 describing genotypic status, we can build up two SLPs, z1 = f(x1,x2) and z2 = f(x1,x2), for some specified function f(∘). Based on z1 and z2, a higher level SLP is built, y = f(z1,z2), where y is called model output corresponding to collected phenotypic data denoted by t. x1, x2, and t are observed data (collected through an experiment) while z1 and z2 are unobserved – z1 and z2 are hidden variables. For this example, MLP models the nonlinear relationship between genotypic and phenotypic data without knowing what the true function between them is. Both SLP and MLP are supervised learning models so that during learning, observations of phenotypes act as a teacher to supervise parameter estimation.
Kohonen's net, on the other hand, is an unsupervised learning algorithm. The objective of SOM is to reveal how observations (instances or samples) are partitioned. This is similar to cluster analysis, which however does not infer how clusters correlate. SOM on the other hand can provide information on how clusters correlate. SOM is an unsupervised learning algorithm because it does not use phenotypic data for model parameter estimation.
We will discuss parameter estimation, learning rule, and learning algorithms for both MLP and SOM. The parameter optimization process is commonly based on minimizing an error function, chosen for a specific problem. We will show how the learning rules are derived for MLP and SOM based on their error functions and then discuss some biological applications of these two ANN algorithms.