Nasal consonants are produced by a coupling of the nasal and oropharyngeal resonators. The closed oral cavity and the complex sinus structure of the nose are joined together to form cavities to the main passage (pharynx and nasal tract). By lowering the velum, a passage is opened from the pharynx to the nasal cavity allowing air to escape through the nasal cavity. The nasal resonator is smaller yet more complex than the oral pharyngeal system. In cavity coupling, the soft palate is relaxed to enable both the nasal and oropharyngeal resonators to work together. This connection between the oropharyngeal and nasal resonators is directed by the palatopharyngeal or velopharyngeal mechanism, a muscle complex that contracts to reduce or close the opening located behind the soft palate or (velum). If the palatopharyngeal musculature is relaxed, the nasal port opens permitting the nasal and oropharyngeal resonators to work together (Tiffany and Carrell, 1977). Coupling between the nasal cavity and velopharyngeal port changes according to different sounds formed in the supralaryngeal vocal tract, however.
Fujimura (1962) claims the primary frequency range of interest for nasal consonants is between 200 and 2500 Hz. Nasals are characterized by a stable concentration of energy in the lower frequency regions with a first formant near 300 Hz. Due to the presence of an antiformant there is little energy in the areas around 600 Hz. Nasal sounds in general are highly damped and their presence weakens the upper formants of neighboring vowel sounds. This is caused by the broader band frequency response in the vocal tract, since broadly-tuned resonators fall away more rapidly than narrowly-tuned ones. During nasal production, the nasal and oral cavities resonate together resulting in a loss of amplitude (or antiresonance) at certain frequencies. These two cavities affect each other and, at times, even cancel each other out if both resonate at similar frequencies. A lessening of broad-band resonances and an absorption of acoustic energy in the oral cavity and nasal walls will also result in antiresonances. The high density of formants in the frequency range with the existence of antiformants causes the sound energy of nasals to be spread evenly throughout the central frequency range (800-2300 Hz). Although the shape of the antiformant will vary depending on the place of articulation, the overall spectral shape of the nasal consonants remains basically the same (see Fujimura, 1962).
A significant amount of research on nasals has focused on the acoustic cues that are required for nasal perception. Nakata (1959) found that the nasal murmur makes a significant contribution to the perception of place of articulation. Malecot (1956) and Mermelstein (1977), however, discovered that place of articulation of [m], [n], and [] is largely perceived by the transitions of the adjoining vowel formants. Recently the major consensus among researchers is that the acoustic cues necessary for identifying nasal place of articulation are found in both the nasal murmur and formant transitions, (See Kurowski and Blumstein, 1984; Repp, 1988; Ohde, 1994; Harrington, 1994.)
Word-final [n] in Japanese differs phonetically from initial [n] since it is articulated in varying positions depending on the context. Final [n] is also distinguishable by its unreleased quality. Vance (1987) refers to final [n] in Japanese as the ``mora nasal.'' He transcribes it as an unreleased uvular nasal [N:vspace]. Sakuma (1929) calls final [n] an unreleased velar nasal when it occurs in words like onsen (See Figure 2.) Jones (1967) claims Japanese final [n] is articulated somewhere between a typical alveolar nasal and a nasalized fricative sound. Final [n] in [onsen] possesses a different acoustic quality depending on the neighboring consonant. For example, final [n] in onsen ka `onsen' is more velar than in onsen ni `in onsen,' (see Vance, 1987).
0.3in
) Frequency
Spectral differences between nasal sounds are due to
modifications made within the oral resonator. The first
formant (
) is lower for [m] than it is for [n] since the
vocal tract is longer for a bilabial than for an alveolar sound. Formant
frequencies and their corresponding bandwidths are listed in Table 1.
The
of [m] and [n] in both English and
Japanese averaged between 250 and 300 Hz. Acoustic energy is concentrated
in a lower frequency region as evidenced by the shorter
bandwidths. Notice the lower formant emphasis of [n] in both onset
and coda positions in the spectrogram of the word none below.
Figure 1. A spectrogram of the word ``none`` as
spoken by a native speaker of English.
From the data, the average
of [m] was lower than the
of [n] within both languages.
of initial [m] was
265 Hz in Japanese and 232 Hz in English, a difference in frequency slightly
greater than that between the first formants of initial
[n] in both languages.
The
of word-final [n] varied significantly in both
languages. The
of English [n]
is lower (263 Hz) in final positions than in
initial positions (274 Hz). The opposite result occurred in Japanese,
where the
of [n] was 300 Hz in
final positions compared with 285 Hz in initial
positions (see Table 1).
Table 1. Parameter values for the nasal consonants. The unit for the formant frequency (F) and its bandwidth (B) is Hz.
The durations of the target sounds were examined from the English and Japanese word tokens. Results show that the average duration of [n] is slightly longer than [m] within both languages. As you would expect, nasals are longer in English than in Japanese, particularly in word-final positions where the average duration was two times greater. (See Figure 3.)
It is not surprising that the target sounds are longer in
English than in Japanese. Japanese is a syllable-timed language.
Each syllable (or mora) in Japanese is
comprised of a single consonant followed by a vowel
and is pronounced with basically the same duration.
English is a stressed-timed language.
The duration of syllables in English is different
depending on the context.
Figure 3. Comparison of mean durations (in ms) of English (E)
and Japanese (J) nasals in onset and coda positions as
spoken by the Japanese and American subjects.
frequency of
[n] is naturally higher than its bilabial counterpart since the vocal
tract configuration of
final [n] is shorter than that of initial [m].
It is interesting, however, that the
of Japanese [n] is
much higher in coda positions than in onset positions.
This can be attributed to
the varying articulation of final [n] in Japanese, since it is
articulated either as an alveolar, velar, or uvular nasal depending on
the context. Further research is necessary to examine
the phonetic peculiarity of word-final [n] in Japanese.
The next experiment examines and compares the acoustic properties
(
frequency and duration) of the nasals [m] and [n] in English
and Japanese as produced exclusively by native speakers of Japanese.
) frequencies of the target sounds were calculated from an FFT
frequency response with a preemphasized, low-pass filter. A
sampling-frequency
of 19 kHz was used with a frequency range between 0 Hz and 5,000 Hz.
) Frequency
frequencies of [m] and [n] in initial and
final positions were measured from Japanese recorded production
of the English and Japanese word tokens. Since spectral differences
between nasal sounds are due to
modifications made within the oral resonator, adding the nasal
cavity to the vocal tract increases the size of the resonator
which greatly affects the frequencies of the sounds. The first
formant is typically lower for [m] than for [n] since the vocal tract
is longer during production of [m].
Results indicate the average F1 frequency
of initial [m] within both English and Japanese were comparable at
313 Hz and 316 Hz, respectively. The average
of
final [m] in English is slightly lower at 307 Hz.
Significant differences are observed in the
frequencies of Japanese [n]
in initial and coda positions. The
of [n] averaged 320 Hz in initial positions and
345 Hz in word-coda positions (see Figure 5).
The LPC results from the first experiment were lower (272 Hz for
initial [n] and 300 Hz for final [n]); however, they also revealed a higher
for final [n].
The average
of English [n] as recorded by the Japanese
subjects within
both initial and final positions varied
as did Japanese [n] but in a reverse manner.
The subjects produced initial [n] in English with an average
of 345 Hz, while producing final [n] with an average
of only 288 Hz.
Figure 5. A comparison of
frequencies of target sounds in
English and Japanese words as spoken by Japanese subjects.
The durational differences between initial [m] and [n] within Japanese word tokens are minimal. Both sounds averaged 68 ms. The durational differences of the target sounds are greater in English as initial [m] is 110 ms and initial [n] is 103 ms; English [m] and [n] in final positions averaged 163 ms and 147 ms, respectively. These results differ from the results of the first experiment which revealed that English [n] is slightly longer than [m] within both positions.
The Japanese subjects produced nasals longer
in English than in Japanese. Final
[n], in particular, is much longer in English (147 ms) than
in Japanese (99 ms), but is still significantly shorter than
the final [n] as recorded by native
English speakers in the first experiment (see Figure 6).
Figure 6. The mean durations of nasal sounds in Japanese and English words as spoken by Japanese subjects. The unit of measurement is in milliseconds (ms).
The
frequencies of the target sounds within
both languages were also measured as recorded by the Japanese subjects.
Results reveal the
of [m]
was similar within both languages. The
of final [n]
differed across languages, however. Japanese [n] has a much
higher
emphasis in final positions than in onset positions.
One would expect this
considering the phonetic nature of final [n] in Japanese since it can
be articulated either as an alveolar, velar, or uvular sound depending
on the context.
The fact that Japanese word-final [n] is an unreleased sound also
contributes to its unpredictable acoustic nature.
Another factor that is perhaps worthy of consideration is the role
played by final [m] not being contained in the Japanese sound
system.
This would tend to place additional responsibility on [n]
to behave more flexiblely in word-final positions, i.e., vary in manner
of articulation depending on the context.
The
data from Japanese recorded production of English [n]
is cryptic. It is puzzling why subjects produced English [n]
in word-initial positions with a much higher
frequency (345 Hz),
while final [n] is just 288 Hz. This could be a result
of subjects producing final [n] more forward in the vocal tract,
possibly with a configuration more closely resembling bilabial [m].
This misarticulation could also be a result of the
unreleased nature of
final [n] in Japanese, and further indication of L1 interference.
By comparing the spectral differences of Japanese recorded production of [m] and [n] in Japanese and English word-initial and word-coda positions, this paper posits that these production differences may at least partly explain the significant acoustic differences of these sounds as produced by Japanese speakers of English. Further research is necessary to examine the acoustic and phonetic uniqueness of word-final [n] in Japanese to see how it interferes with Japanese production and perception of English nasals. A cross-language perception study of nasal sounds, particularly in word-final positions, would also greatly assist our further understanding of the discriminational difficulties of these sounds as experienced by native speakers of Japanese.
Harrington, Jonathon.(1994). The contribution of the murmur and vowel to the place of articulation distinction in nasal consonants The Acoustical Society of America, 96 (1), 19-32.
Jones, D. (1967). The phoneme: Its nature and use. Cambridge: Cambridge University Press. (quoted in Vance, 1987).
Kurowski, K. and Blumstein, S.(1984). Perceptual integration of the murmur and formant transitions for place of articulation in nasal consonants. The Journal Of The Acoustical Society Of America, 76 (2), 383-390.
Ladefoged, P. (1993), (3rd ed.). A course in phonetics. New York: Harcourt Brace Jovanovich College Publishers.
Malecot, A.(1956). Acoustic cues for nasal consonants: an experimental study involving a tape-splicing technique. The Journal Of The Acoustical Society Of America, 32, 274-284.
Mermelstein, P.(1977). On detecting nasals in continuous speech. The Journal Of The Acoustical Society Of America, 61, (2), 581-587.
Nakata, K.(1959). Synthesis and Perception of Nasal Consonants. The Journal Of The Acoustical Society Of America, 31 (6), 661-666.
Ohde, R.N.(1994. The development of the perception of cues to the [m]-[n] distinction in CV syllables. The Journal Of The Acoustical Society Of America, 96(2) 1-12.
Repp, B.H. and Svastikula, K. (1988). Perception of the [m]- [n] distinction in VC syllables. The Journal Of The Acoustical Society Of America, 83 (1), 237-247.
Sakuma, K. (1929). Nihon Oneigaku, Tokyo: Kazama Shobo. (quoted in Vance, 1987).
Singh S. and Singh K. (1982). Phonetics: principles and practices, (2nd ed.). Austin, Texas: Pro-ed.
Tiffany, W. and Carrell, J. (1977). Phonetics: theory and application. New York: McGraw-Hill Publishing Company.
Vance, Timothy J. (1987). An introduction to Japanese Phonology. New York: State University of New York Press.