FS1R - Formants for Beginners
A brief guide for users of the FS1R and Mac editing software, and users of FS1r stand alone. If you are thinking about getting your hands on an FS1R, then please feel free to take a print out of this brief guide along to your dealer and try it out (don't forget to tell your browser to use black for the text colour while you print the page!).
In the beginning, there was only darkness...
1998 saw the announcement of the Yamaha FS1R rack mount module, which features new technology known as Formant Shaping Synthesis, in addition to providing an FM tone generator that is upwardly compatible with the DX7. Formant shaping synthesis directly controls "formants" to allow the creation of a whole new set of sounds that are completely impossible for today's sampling-based synthesisers or earlier FM tone generators to replicate.
"Formants" are spectral features that are independent of changes in pitch, and create the essential character of human voices and of the sounds of many musical instrument. The characteristic formants in a vocalised "aah" are what allow us to recognise it as "aah" regardless of the pitch that is sung.
The FM capabilities of this new synthesis engine differ from earlier FM tone generators in having eight operators (the DX7 had six) and 88 algorithms (the DX7 had 32). While the DX7 provided only a sine wave, the FS1R provides an additional seven types of basic waveform for each operator - two waveforms with partials in addition to the fundamental, two waveforms with odd-numbered harmonics, two waveforms with resonance, and one waveform with a formant - a total of eight different basic waveforms.
If you have the FS1r already , you can hear what kind of sounds are possible by listening to the on board demo.
TRY IT OUT!
To experience the unique sounds that can be produced by formant shaping synthesis, please listen to Internal 003: "Choir". This is a distinctive sound that uses formants. You can modify the formants by rotating Knob 3 toward the left to produce an "ooh" sound, and toward the right to shift to an "aah" sound.
More analog vicar?
In addition to these powerful FM capabilities, the FS1R provides Analog Physical Modelling Filters that allow filters to be used to control an FM tone generator in the same way as on an analog synthesiser.
Wavetable... pah, that's for dullards (no offence!)
Formant Sequences are another interesting capability of the FS1R. These are preset data of formant movements that allow formants to be handled similarly to vocal phrases or drum phrases. For example, one of the preset voices provides formant movements that produce the phrase "Four, Three, Two, One." Since the actual formants are being controlled (unlike a sampled phrase in which a piece of audio is merely played back), the tempo or timbre of the "Four, Three, Two, One" can be modified without affecting the character (formants) of the voice. Formant sequences can be easily looped, and their tempo can be synchronised to an external sequencer. The FS1R provides 90 different preset formant sequences.
What is a Formant?
By auditioning the preset sounds, you will gain a feel for the types of sound that the FS1R is particularly good at creating. At the same time, some of you may be reaching for the EDIT button and examining the parameters as you wonder how on earth such sounds can be created. However, blindly modifying the sound parameters is unlikely to produce the desired result. When using a synthesiser to create sounds, it is important to have a systematic understanding of "what" you want to change, and "how" to change that aspect of the sound. First we will need to learn the basics of how the FS1R produces sound.
Things that make you go oooh!
If you view a human voice on a spectrum analyser (a device that visually displays the frequency components of an audio signal), you will see "bumps" where specific frequency areas are emphasised. These "bumps" in the frequency spectrum are called Formants. If the same person vocalises at various pitches, these formants will not move significantly. Such formants that are not affected by the frequency are called "fixed formants."
If you use a spectrum analyser to look at a person's voice when it has been recorded on tape and played back at double speed you will notice some characteristics in the sound. There are plenty of spectral analysis programs available for you to try this out at home.
Double-speed playback will raise the pitch one octave higher than the original pitch. In this case, if you use an application such as Wavelab for example, and view the spectrum, you can see that the formants also move one octave higher, and as you have probably experienced, the character of the double-speed voice will be quite different. Formants that are affected by pitch in this way are called "movable formants."
Chip & Dale backing vocals.
If you have ever listened to a double-speed playback of a human voice, you understand how important the fixed formants are in making a voice sound "human." The location of vocal formants differs by age or sex (or chipmunk status!), and the sound generated by vibrations in the vocal cords is also affected by the size of the mouth and throat, and the size and shape of the skull. These elements act as audio filters (fixed formants) for the sound, and give each person's voice its unique character. However it is not the case that movable formants are completely absent from a human voice. Movable formants can be observed at each vocal pitch, but their effect is not as great as that of the fixed formants. The reason that a voice sounds peculiar at double-speed playback is that the formants have been moved to locations where they would not normally occur. In any case, it should be clear that formants cannot be ignored if you are concerned with producing realistic sound.
Otherwise... it's Disney time!
Not only human voice but any sound has its characteristic formants. In some sounds the movable formants might be emphasised, while in other sounds the fixed formants might be emphasised. For example when using a synthesiser to simulate an actual sound, it is often the case that the simulation is quite realistic in a certain pitch range and quite unrealistic in other pitch ranges. This is due to the fact that most synthesisers do not provide fixed formants as part of the sound-creating process.
It is well known that sound consists of pitched components and noise. Here it is important to understand that when we say "noise," we are not using the word to mean, for example, the coughs and cigarette lighters going off in the audience at a gig. Rather, the distinction here is between sound that has a recognisable pitch, and sound that does not have a recognisable pitch (or has insignificant sense of pitch), like most of the current singers in the UK charts!
If you have used synthesisers before, you know the important role played by "noises" such as the hammer strike that occurs when a piano key is pressed, or the "scraping" sound of a bow on a violin string. These are indispensable elements of the musical sound, but in the technical sense that we are using here, they are "noise." The sound of a bow scraping the string is heard together with the sustained "pitched" sound of a violin note, and we hear the combination of these sounds as the "sound of a violin." The sound of breath being blown into a flute has a similar role. In the case of human voice, "voiced sound" (vowels) correspond to the pitched component, and the "unvoiced sound" (consonants) correspond to the noise (un-pitched) component.
Pitched sound and noise both contain movable formants and fixed formants. Those occurring in the pitched sound are called "voiced formants," and those occurring in the noise component are called "unvoiced formants." To summarise, realistic sound must contain both pitch and noise components, and movable formants and fixed formants must be considered for both of these components. FS synthesis is a method of tone generation that satisfies these requirements.
The day the universe changed...
The concept of FS synthesis or... Time to get the thinking cap on, as this makes Quantum Hydrodynamics look easy).
FS synthesis allows you to perform FM synthesis. However since the process in which FS synthesis creates sound is completely different than that of FM synthesis, even people who are familiar with FM synthesis need to take time to understand the basics of FS synthesis. The following pages will introduce you to the basic concepts of FS synthesis. If you would like to learn more about FM synthesis, please refer to the following section "About FM synthesis."
We have explained that in order to create realistic sounds (i.e., to accurately simulate an existing sound), we need a combination of movable formants and fixed formants, pitched sound and unpitched sound. In this way, we divide the sound between the Voiced tone generator and the Unvoiced tone generator, and create the final sound using parameters that have been optimised for each tone generator. If we use the example of a human voice, the pitched component that varies in pitch as the vocal cords are tensioned or relaxed would be produced by the Voiced tone generator. The noise component (created by movements of the tongue or lips) that does not change significantly in response to pitch would be produced by the Unvoiced tone generator.
If we use the example of an instrumental sound, the Voiced tone generator would be used to produce the portion of the sound whose pitch changes as a scale is played, and the Unvoiced tone generator would be used to produce the noise component that is not directly affected by the scale. In the sense that we are dividing a sound into two parts and using two tone generators to produce these parts, this is reminiscent of the hybrid (AWM+FM) tone generators of the Yamaha SY99 or EX5. However since FS synthesis is designed to control the formants themselves, and closely links the Voiced and Unvoiced tone generator sections with each other, it provides a more intrinsically unified system of tone generation than other hybrid tone generators. In the following sections we will explain each of these tone generator sections and the formant control mechanism.
Back to basics
On most synthesisers, the oscillator determines the pitch of the basic waveform, and you can select whether this pitch will correspond to the note played on the keyboard or whether the pitch will be fixed. It goes without saying that this concept is also supported on the FS1R. Each oscillator of the FS1R's Voiced tone generator (Voiced operator) allows you to select a waveform that contains a formant structure. When this waveform is selected, the formant components that will be included in the waveform can be freely adjusted. Since formants are such an important element in creating realistic sounds, the designers of the FS1R provided control of the formant independently from the basic waveform of the oscillator.
Basic waveform pitch: The harmonic partials include a movable formant (controlled by keyboard pitch).
The basic waveform selected for an operator can be controlled by parameters such as frequency ratio, basic frequency, EG, and Freq EQ (an EG that varies the frequency of the operator). However when a formant waveform is selected for an operator, the basic frequency (Freq Coarse/Fine) controls the centre frequency of the formant, and the Freq EG can be used to create time-variant changes in the centre frequency of the formant. (In this case, the frequency of the waveform - but not the centre frequency of the formant - will change according to the note played on the keyboard). The result is that the fixed formants can be set freely.
When the FS1R is compared with previous synthesisers, several important differences become clear. In FS synthesis, a single operator provides independent control of both the basic pitch and the formant pitch (centre frequency). Synthesiser users know that the pitch can be changed by modulating the pitch of an oscillator in various ways, but if there is only one oscillator, it is normally possible to obtain only one pitch at any moment.
Those familiar with analog synthesiser techniques may point out that a filter can be driven into oscillation to produce a resonance pitch (self oscillation). But what if there was just an oscillator without even a filter? Incidentally, since the basic waveforms provided by the FS1R include waveforms that let you specify the resonance, the resonance effects produced on analog synthesisers of the past using the oscillator and filter in combination can be produced by FS synthesis using just an oscillator (operator) by itself, in the same way as discussed earlier for formants. This is because the bandwidth of the resonant peak can be controlled separately from the basic pitch of the waveform.
A basic waveform can be selected independently for each of the eight Voiced operators. When creating sounds on FM tone generators of the past, frequency modulation between operators was used to generate harmonics (overtones), but as you have seen above, the FS1R can produce a basic sound using just one operator. This means that you might select the formant waveform for several operators, set each operator to produce a male or female vocal sound, and control their respective formants independently to produce the sound of a human chorus, and not have to rely on effect processors to produce the impression of a greater number of singers.
However in actuality, a more realistic sound can be created if you use the formant structure included in a formant waveform to produce the fixed formants, and use other operators to perform FM synthesis to produce complex overtone structures and movable formants. Whether to emphasise the fixed formants or the movable formants will depend on the type of sound that you wish to create. It is known that most sounds can be created by combining several movable formants and fixed formants. For a realistic human voice, the first formant (300-700 Hz) should be movable, and the second through fourth formants can be fixed. For a female voice, use a movable formant for the second formant.
Now let's proceed to the role of the Unvoiced tone generator. In a sense, the Unvoiced tone generator is a type of noise generator, but it is not as simplistic as the noise generator found on an analog synthesiser. As with the Voiced tone generator, the Unvoiced tone generator provides oscillators (eight Unvoiced operators), but does not provide a "noise waveform." Each Unvoiced operator produces an unvoiced formant - not simply noise.
It starts getting heavy here!
Basic waveform pitch depends on how the operator is used (fixed/corresponding to keyboard pitch/linked to a Voiced operator) It is easy to create a basic "noise" component simply by setting the centre frequency of the operator. As you know, noise has no particular timbre other than the "hissing" or "roaring" character that is determined by the centre frequency of the noise; i.e. it has no formant structure. If the operator mode is fixed (Freq Mode = normal) so that it is unaffected by the keyboard pitch, the centre frequency will be changed only by the Freq EG, just as in the case of a Voiced operator.
Since an Unvoiced operator produces mainly noise components, it can produce "jet aeroplane" sounds very easily! However by narrowing the frequency bandwidth or using resonance to create a peak, you can create a "bump" at a specific point in the frequency response (invisible unless you have a spectrum analyser!), and the result will sound much like a formant. If you then link the operator mode to the keyboard pitch (Freq Mode = Link F0), you will be able to distinguish between the part of the noise that changes in response to the keyboard and the part that does not change.
These are the movable formants and fixed formants (unvoiced formants) produced by an Unvoiced operator. Notice the similarities and differences with how voiced formants are created by a Voiced operator.
The difference is simply that the basis of the sound is either pitched sound or noise(voiced formant or unvoiced formant). The similarity is that there can be movable formants and fixed formants in either, and that while it is obvious that a melody can be played by a voiced operator, it is also possible to play a melody using the movable formant of an unvoiced operator. Let's take a moment to think about this from the standpoint of creating a sound. When you whistle a melody, the melody can be perceived as pitched sound, but the sound itself consists mainly of unpitched "breath noise."
In other words, to create a realistic simulation of whistling, we would use mainly Unvoiced formants. However if the same whistling were heard from a distance, the noise component might not need to be emphasised as much, and in this case, we would use mainly voiced formants. The FS1R provides a V/N Balance parameter to adjust the balance between voiced and unvoiced formants, which makes it easy to make this type of change. This is further evidence of the close link between the Voiced and Unvoiced tone generators, but let's pursue this connection a little further.
When the formant waveform is selected for a certain Voiced operator, the centre frequency of the Unvoiced operator can be linked to the centre frequency of the correspondingly-numbered Voiced operator (Freq Mode - Link FF). Even if the centre frequency of that Voiced operator is being modified by the Freq EG etc., the centre frequency of the Unvoiced operator will follow along, so that the voiced and unvoiced formants will remain together and the sound will have a unified character. From this you can see that when creating sounds on the FS1R, it is effective to consider Voiced and Unvoiced operators of the same number as a pair that work together.
The best example of using Voiced and Unvoiced operators in pairs is when the FS1R's Formant Sequence (FSEQ) capability is used. The formants of the sound being simulated are divided into eight pairs of voiced and unvoiced formants, and for each pair of formants, time-varying "Formant Sequence" data is provided to control the centre frequency and level.
Still with us?
The advantage of FSEQ is that the time-variant changes in the formants of actual sounds can be used as an element of sound creation. By using FSEQ, you can create detailed changes that cannot be produced by the time-variant parameters of the operator (EG and Freq EG). FSEQ tracks 1-8 contain pairs of data for the correspondingly-numbered voiced and unvoiced formants, and data from each track is sent to the Voiced and Unvoiced operators of the same number. This makes it possible to simulate the voiced and unvoiced formants that occur in actual sounds, bringing us closer to the desired realism.
Phew... now go try it!
This has been an introduction to the basics of FS synthesis. In addition to those discussed above, there are various other parameters that allow detailed adjustments to be made to the sound, and these are for you to experiment with.
About FM synthesis
FM (frequency modulation) synthesis was made famous by the DX7. Until the DX7, most synthesisers (analog synthesisers) created sound by "subtractive synthesis," in which the basic waveform produced by anoscillator was sent through a filter to remove unwanted frequency components (overtones). In contrast, FM synthesis used the opposite approach - using frequency modulation to generate new frequency components. With subtractive synthesis, the process of creating a sound begins with a complex waveform (i.e. a waveform that contained frequency components in addition to the fundamental frequency), such as a sawtooth wave which contains all the harmonics (odd and even), or a pulse wave which contains only the odd-numbered harmonics.
The sounds that could be created in this way were limited to the frequency components that were present in the original waveform, meaning that subtractive synthesis was not well suited to creating sounds with a complex or irregular harmonic structure, such as bells. However the FM synthesis used in the DX7 used "operators" to modulate other operators in order to produce new frequency components as desired. This made it easy to produce sounds with a vast diversity of frequency components - not only sounds similar to the analog sawtooth or pulse waveforms, but also complex bell-like overtone structures that included inharmonic partials (in addition to the odd and even-numbered partials), and even noise which contains an extremely large number of frequency components. FM synthesis brought some truth to the advertising cliche of "infinite possibilities" for creating sounds.
On the DX7, the output waveform of each operator was a sine wave. A sine wave contains only a fundamental frequency, and no other frequency components (i.e. no overtones). Overtones were produced by modulating one operator with another. The overtone structure that resulted was determined by the frequency ratio of the carrier and modulator, and by the output level of the modulator (i.e. the modulation level). The frequency of the carrier usually determined the basic pitch, and the output level of the carrier determined the volume.
The most basic type of FM synthesis using two operators. In actuality, the DX7 provided 6 operators, and the FS1R provides 8 operators in its Voiced tone generator section. If three or more operators are available, it becomes possible to use a modulator to modulate a different modulator, or to arrange carriers in parallel. Such complex arrangements of operators are called "algorithms," and are provided as parameters on both the DX7 and the FS1R.
This will provide you with a basic understanding of FM synthesis. On the FS1R, you can build on this FM synthesis by using the formant waveforms and FSEQ capabilities discussed earlier in "FS1R tone generator structure" to use a more synthesiser-like approach to create highly realistic sounds.
Operators to modulate other operators:
In FM synthesis, the operator that applies modulation is called the "modulator," and the operator being modulated is called the "carrier." This is exactly the same as what happens in an FM radio broadcast. (The initials stand for the same words.) In an FM radio broadcast, the narration or music waveform is the modulator, and the basic frequency of the FM station is the carrier.
Analog synthesisers used special "noise generators" in order to produce noise, which is essentially an extremely large number of frequency components. In FM synthesis, the output waveform of a modulator could be "fed back" into its own input and used to modulate the modulator itself. This made it possible to generate an extremely large number of frequency components to easily create noise. The FS1R does provide FM-style feedback, but the Unvoiced operators can produce noise even more flexibly than this. By using FSEQ you can create realistic noise components such as drum sounds.
All sounds can be analysed into a "fundamental" that is the basic pitch of the sound, and "overtones" (higher partials). The fundamental and all overtones can each be considered as a sine wave of a different frequency. This means that by lining up a sufficient number of operators each producing a sine wave, and by adjusting the frequency and output volume of each operator, you could theoretically reproduce any given sound. However this would require an extremely large number of operators, and more importantly, each would have to be controlled freely and independently in order to produce a realistic result. One reason that FM synthesis is such a revolutionary technique is that an enormously wide variety of sounds can be created simply by controlling the modulation (multiplication) of two sine waves.
On the FS1R, the algorithm applies only to the Voiced tone generator.
Originally, the term "synthesiser" meant a device that is able to electronically synthesise sounds ranging from simulations of existing instruments to sounds which have never been heard before. In the broad sense, all electronic musical instruments today might be called synthesisers, so you may have different associations with the term. For example if you need a realistic human voice, the easiest method is to simply sample a voice. However on the FS1R, you can use a variety of parameters to create a voice from scratch that precisely reflects your intentions, including pronunciation and expression. This is what is meant by a "synthesiser-like approach" - similar to the way in which a completely blank canvas gives you the freedom to paint whatever you imagine.