DANN:Genetic Wavelets

From Syncleus Wiki

Jump to: navigation, search
dANN Information
DescriptionAn Artificial Intelligence Library written in Java.
Last ActivityToday
LicenseOSCL Type C
IRC Room#dANN on irc.freenode.org
HomepagedANN
DistributionsBinary ZIP w/JavaDoc
Binary Tarball w/JavaDoc
Source ZIP
Source Tarball
DocumentationJavadoc repository
Javadoc for GIT master
Javadoc for stable release
Developmentgit://git.syncleus.com/dANN.git
TRAC Bug Tracking
Hudson Continuous Integration
Mailing ListsdANN Announcements
dANN Development
Syncleus Announcements


The Genetic Wavelet algorithm has a population of individuals that are evaluated by a fitness function, and it uses recombination operators to create new generations of the population, therefore it can be classified as an evolutionary algorithm. However, there are more details that go into the Genetic Wavelets than classic GAs. Below is a flow chart with the basic steps of the Genetic Wavelets.

In relation to other EAs & GAs, Genetic Wavelets is similar. It follows the basic steps paradigm of population, crossover, and mutation. The large difference comes in the way Genetic Wavelets represents individuals in the population and their chromosomes. The steps of population evaluation(fitness function), convergence check, population size, and the step of mating are all the parts that need to be implemented when solving any problem. Before implementation, one must know the details of Genetic Wavelets. The steps and specifications are explained below.  

Contents

Population Initialization

Genetic Wavelets flowchart

Individuals are represented as organisms. An organism is made up of one or more cells. A cell has a nucleus which contains one or more chromosomes. There can be single-cellular implementations of Genetic Wavelets or multi-cellular. An example of a multi-cellular implementation would be a neural network.

The implementation of the organism is dependent on the problem and how the programmer wants the genes to be represented in the algorithm. This includes whether it is going to be a single-cellular or a multi-cellular implementation. This organism also is where genes would be mapped to traits.

Regardless of how an individual is implemented, a cell is initiated the same way. A cell starts off with one chromosome, and a mutability factor, a floating point variable between zero and ten. A cell can have more chromosomes if mutation occurs; when it is instantiated, a random number(between zero and ten) is tested against its mutation factor, if the random number is less than the mutation factor, another chromosome is added. This continues until a random number greater than the mutability factor is chosen.

Genetic Wavelets represents chromosomes as a pair of chromatids. Each chromatid has a set of genes. There is a right chromatid and a left chromatid. They are joined at what is called the centromere position, which dictates where the gene will be split for crossover. Each chromatid is instantiated with its own mutability factor, then one or more gene is added to the chromatid, depending on the same method above for adding chromosomes to the cell.

The chromosome representation is done as objects in lists, therefore an allele is not dependent on a position in a string, but on an index in a list. At each index of the list is a gene object. There are three types of genes: promoter genes, signal genes, and external signal genes. Each gene has a list containing zero or more receptors. Each gene has an output which is computed by its expression function. This output is a floating point value and is essentially its value, however the output of can also be represented by a set of wavelets, more on this further on. Before additional detail on the types of genes, it is important to know about keys.

The step of population initialization starts with filling the population with randomly generated individuals. Each one of those individuals is created with one or more cells, each containing one or more chromosomes. Each chromosome is initialized with one or more gene, which could be of any type, selected randomly.

Keys

Genetic Wavelets use keys, which use schemata to encode their values. A key is a variable length string that is evolved during the genetic wavelet algorithm using x as a wild card character. A key, specifically a signal type key, might look like this:

Genetic Wavelets Signal Key Example: 01x01x10
 

A key is a schema[1], but it is not used for chromosome representation, instead it is used to model communications between cells. Keys are used to represent two important components of Genetic Wavelets - receptors and signals. The interaction is loosely modeled after hormones binding to cells. There are restrictions for signals binding to receptors. A signal can bind to a receptor if and only if the receptor matches the signal on all non-x points. For example, the signal key "101x11" can bind to the receptor "10xx11," but "101x11" cannot bind to the same receptor.

Genetic Wavelets Key Bind Example.
Signal 01101x10 successfully binds to receptor 110xx10.
Genetic Wavelets Key Mismatch Example.
Signal 0101xx1 fails to bind with receptor 110xx10.

Keys do not have to be equal in length in order to bind to one another; a smaller length receptor would bind with more signals, because it would bind to more sites on a signal. These signal and receptors are an integral part of Genetic Wavelets. Keys can mutate along with the system, however they are not changed on crossover. A key is initialized with one or more values depending on the mutation factor of the gene that is initializing it. All of the genes in Genetic Wavelets use keys in some way.

Genes

There are three types of genes; they all have a few things in common. First, they all have one or more receptors. Receptors have a receptor key. A gene starts off with at least one receptor, depending on mutation. Every gene also has an expression function, as mentioned before. The expression function is calculated using the receptors to map each one to a dimension in a wavelet, more details on how the expression function is calculated can be found in the next section. The floating point value output of the expression function is known as the current activity of a gene – it is the value that it outputs.

The first type of gene is the signal gene. Signal genes are responsible for creating a signal key that may or may not bind to other genes' or its own receptors. Signals genes each have a concentration which is calculated in the tick step. A signal gene outputs its signal local to the cell it resides in. A signal binding to a receptor will increase the output of that receptor by its concentration in the affected gene’s expression function.

An external signal gene is similar to a signal gene, but it has a direction that it is facing – it can either be inward facing or outward facing. If it is outward facing, it outputs signals to other cells, and its receptors receive signals from within the cell. If it is inward facing, then it outputs signals to the local cell, and receives signals from other external signal genes in other cells. These genes are only used in multi-cellular implementations.

The last type of gene is a promoter gene. Promoter genes exist to promote a gene that is a certain distance away from it on the same chromatid. Promotion of a gene is calculated and affects the current activity of a gene. The current activity is increased by the product of its expression function output and how much it is being promoted. The gene that a promoter gene affects can change due to mutation.

Pre-Tick, Tick, Wavelets, and The Expression Function

The pre-tick step is the first step in the simulation phase. This is where each gene calculates its new activity from its expression function. This is stored as its pending activity, because it is not applied as its expression yet. Each signal that is being expressed in the cell is tested to see if it binds to each receptor on the cell. If it does bind to a cell, then its value is changed, if not its value is zero.

This step in the simulation calculates and applies the promotion value. The current activity is set to the sum of the pending activity and the product of the pending activity and its gene’s promotion. Promotion does not affect value of the expression function, just changes its output.

The expression function is the core of the Genetic Wavelets algorithm, and the origin of the algorithm’s name. The expression function is a collection of one or more wavelets. A wavelet is a wave that starts out with an amplitude of zero which increases and then goes back to zero. A common example of a wavelet is the wave caused by a heartbeat. A wavelet is described by a few different properties: amplitude, center point, phase, form, and distribution. When a gene is initialized, the expression function is created with one initial wavelet that is generated randomly. The number of wavelets in the expression function is dictated by mutation. When a gene mutates, its expression function has a chance to add, remove, or modify wavelets. Below are two examples of what the expression function could look like. The first is a two dimensional one (two receptors) with one wavelet, the second one is what it would look like if another random wavelet were to be added to it.

Wavelet example
Wavelet overlay

The expression function is multi-dimensional, each dimension corresponds with a receptor, and the receptors value. A receptors value is zero, unless a signal binds to it. Each wave in the expression function will have the same dimensional values. So, in the above examples you see each wave has two dimensions – x and y. Therefore if these waves were expression functions, the genes that they were from would only have two receptors on them. Or they could have twenty receptors and only have two of them actually be bound to signals.

The expression function can be used in two ways, through its floating point output, and as a single function that it can represent. Because the expression function is a function, it can be used as part of the solution. For instance, by using any wavelet transform technique[1], multiple wavelets can be combined into a single wavelet; using a technique on the two wavelets above could result in the wavelet below.

Using a gene’s expression function as a way to map the organisms behavior depending on different input could yield powerful results. Again, the way the genes affect an organism is up to the details of the problem and its implementation. Either way, the expression function’s method of adding multiple multi-dimensional wavelets is an effective way of building a diverse function. The algorithm can also be implemented by using the expression function’s floating point output.

The way the expression function finds the floating point output corresponds with the multiple wavelets. The floating point output comes from the sum of an output from each individual wavelet. The output for each wavelet collapses the wavelet, the formula for the collapse is below.
Wavelet Transform Example

Where a is the amplitude of the wavelet; d is the distance from the center (the sum of each dimensional value’s distance from center on that dimension); f is the form of the wavelet; p is the phase of the wavelet; r is the distribution of the wavelet. Let c = aSin(2π(d + p/360)). The floating point output for the wavelet is o, and o=d(c/|c|)(|a(|c/a|f)|).

Whether using the floating point output or the wave function output, the expression function is what is used to calculate the values of each gene depending on the signals that exist in a cell, and the receptors that those signals bind to. This process was modeled off of Gene Regulatory Networks, which is the structure by which organisms propagate their genes.

Evaluating the population and Convergence Check

The fitness function can be implemented like that in any other GA, and it will differ greatly depending on the problem that is trying to be solved. Each gene has two potential outputs that could be used in the fitness function as mentioned above. So it needs to be decide whether the implementation is going to be dealing with graphs or values. The evaluation step starts with going through each individual, and evaluating them. Evaluation has to do with how well their genes perform in the fitness function. These details are dependent upon the problem being solved by the algorithm. The convergence check is also dependent on the implementation, traditional checks for convergence were outlined in previous chapters.

Mate and Mutate

The mating step is another item that needs to be implemented, but it is not necessarily dependent upon the problem. The crossover operator is already defined for chromosomes. What needs to be defined is which individuals will mate with other individuals. When this occurs, mutation in the genes is simulated using each chromatid’s mutability factor. The number of chromosomes can also be mutated based upon the mutation factor in each cell of the organism.

Which individuals get chosen for the next generation of the population also needs to be decided by whoever is implementing the algorithm. There are many methods of choosing this that have been described in previous chapters, any of which could possibly be implemented.